Research Article

RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches

Table 3

RNA-Seq-based BRCA subtypes classification using 5-fold cross-validation with 100 repeats. The first column denotes the five kinds of subtypes, and we built a binary classifier for each subtype by splitting the data into control and experiment groups. The sample size of two groups was imbalanced, so the “SMOTE” sampling method in the second column was utilized to lessen the interference of imbalanced data. The “LumA” subtype was an exception because it had sufficient samples. The third column denotes the five kinds of metrics used in this experiment, and the remaining columns are the three kinds of machine learning approaches adopted in this research, where the “svmRadial” represents the svm with radial basis kernel.

SubtypesSamplingMetricsnbrfsvmRadial

Basal-likeSMOTESensitivity0.97370.96050.9737
Specificity0.95800.99160.9720
Accuracy0.96070.98610.9723
F10.89700.96050.9250
AUC0.98470.99760.9968

Her2SMOTESensitivity0.90630.78130.8750
Specificity0.88530.96010.9526
Accuracy0.88680.94690.9469
F10.54210.68490.7089
AUC0.95620.97970.9798

LumANoneSensitivity0.90670.86670.9067
Specificity0.81730.88460.8462
Accuracy0.86370.87530.8776
F10.87370.87840.8850
AUC0.91340.99520.9481

LumBSMOTESensitivity0.84150.81710.5488
Specificity0.83760.92880.9544
Accuracy0.83830.90760.8776
F10.66350.77010.6294
AUC0.90750.94940.9043

Normal-likeSMOTESensitivity0.81250.75000.5000
Specificity0.85170.94980.9833
Accuracy0.85020.94240.9654
F10.91630.96950.9821
AUC0.91250.96000.9640