Research Article

Evaluating the Impact of Data Transformation Techniques on the Performance and Interpretability of Software Defect Prediction Models

Table 3

The p-values of the Wilcoxon signed-rank test and average percentage performance gains on all six performance evaluation indicators between the transformed models and original models built with different classifiers.

p-Values and percentageO vs. LO vs. MO vs. Z

RF-Accuracy000
RF-Precisionāˆ’0.3%00
RF-Recall0.7%00
RF-F10.2%00
RF-AUC000
RF-MCC000
LR-Accuracy5%3%3%
LR-Precision4%6%4%
LR-Recall7%āˆ’2%2%
LR-F17%04%
LR-AUC5%2%3%
LR-MCC2%12%15%
DT-Accuracy1%0.9%0.9%
DT-Precision0.9%0.9%0.9%
DT-Recall5%4%4%
DT-F13%3%3%
DT-AUC2%2%2%
DT-MCC8%8%7%
NB-Accuracy7%2%2%
NB-Precisionāˆ’5%3%3%
NB-Recall61%4%4%
NB-F131%5%5%
NB-AUC7%2%2%
NB-MCC26%9%9%
KNN-Accuracy3%1%2%
KNN-Precision3%4%4%
KNN-Recall7%āˆ’0.8%0
KNN-F15%1%2%
KNN-AUC4%2%2%
KNN-MCC17%9%12%
MLP-Accuracy2%4%4%
MLP-Precision1%4%4%
MLP-Recall7%3%7%
MLP-F15%4%7%
MLP-AUC2%3%4%
MLP-MCC8%13%17%