Research Article

Breast Cancer Identification from Patients’ Tweet Streaming Using Machine Learning Solution on Spark

Table 5

Cross-validation result of ML models that are applied to features selected by univariate.

ModelsAccuracy of cross-validation (%)Accuracy of unseen data (%)Best value of parameters

LR98.698.4regPram: 0.1
maxIter: 30
DT97.8090.35impuity: gini
maxDepth: 5
maxBins: 32
SVM98.298.07regParam: 0.02
maxIter: 50
Kernal type: Liner
RF99.193.85maxDepth: 6
maxBins: 32
numTrees: 20