Research Article

Breast Cancer Identification from Patients’ Tweet Streaming Using Machine Learning Solution on Spark

Table 3

The accuracy of 10-fold CV and the accuracy of the unseen dataset after correlation.

ModelAccuracy of cross-validation (%)Accuracy of testing data (%)Best value of parameters (%)

LR99.0698.7regPram: 0.1
maxIter: 20
DT98.690.3impuity: gini
maxDepth: 5
maxBins: 32
SVM99.198.4regParam: 0.02
maxIter: 50
Kernal type: Liner
RF99.596.9maxDepth: 7
maxBins: 32
numTrees: 20