Research Article

A Novel Feature Selection Method for Classification of Medical Data Using Filters, Wrappers, and Embedded Approaches

Table 1

Summary of data mining algorithms using feature selection on medical datasets.

ReferenceYearTechniquesDatasetsAccuracyLimitations

Ghosh et al [23]2021Relief, LASSOCleveland, long beach, Hungarian, stat log99.05%Classification algorithm is dependent on feature selection method.
Alirezanejad et al. [24]2020Heuristic methodsColon, leukemia92%Meta heuristics can be applied to remove unnecessary attribute prior to classification
Khan et al. [25]2019Gabor filter bank + SVMDDSM database92.48%The proposed system computational cost is much higher and they proposed further research to optimize the computational cost
Lü et al. [26]2019RBM + SVMMNIST handwritten database81.87%Due to the determinacy of configuration parameters, the RBM feature extraction is not feasible, which needs to be improved in further research.
Shrinivas D. Desai et al. [27]2019BPNN + LRCleveland dataset78.88%The proposed model cannot be used as a clinical expert, it only complements the decision of clinician for taking better diagnostic decisions
Vijayashree et al. [28]2018PSO + SVMCleveland heart diseaseThe proposed model can further be improved using ensemble classifiers.
Kalantaria et al. [29]2018GA-SVMCalifornia at Irvine (UCI) machine learning repository84.44%The proposed system further optimization needs in terms of achieving high performance in detection on medical datasets.
Dwivedi et al. [30]2018SVMStatlog heart disease dataset90%The proposed system cannot be used for the predication of disease levels.
Jianguo Chena et al. [31]2018Disease diagnosis and treatment recommendation systemPubMed dataset90%The security prospective is not been addressed. Feature selection is not considered.
Tayefi, et al. [32]2017Decision tree, hs-CRPUCI dataset94%Due to some risk factors in diabetic patients, the proposed model does not consider some key factors to evaluate the system for high performance.
Hoque et al. [33]2016Decision treeUCI dataset71.2%To incorporate incremental fuzzy feature selection technique for classification of DDoS attack traffic.
Random forests83.12%
Naïve bayes28.60%
kNN94.50%
SVM51.14%
Bennasar et al. [34]2015mRMRSonar datasets88%It disregards the interaction between the features and the classifier, as well as the higher dimensional joint mutual information between more than two features, which sometimes can lead to a suboptimal choice of features.
JMIM87%
NJMIM86%
EMary et al. [35]2015Gray wolf optimizationUCI datasetImprovement can be made using advanced feature selection method.
Veronica et al. [36]2015ReliefFMicro array datasets90.24%The developed techniques should be tested on multiplatform, distributed learning, and real-time processing. It provides a new line of research for researchers to work on datasets with numerous increases in dataset use in feature selection.
Information gain82.32%
mRMR65.23%
CFS65.36%
FCBF55.21%
Chi-Squared81.2%
Zhang et al. [37]2014Sequential forward floating selection (SFFS)Spam based data set94.07%Slow processing, used only decision tree for classification
SBS95.28%
MBPSO91.97%
Verónica et al. [38]2014Correlation-based feature selection (CFS)Brain (UCI)66.67%To distribute the microarray data vertically (i.e., by features) in order to reduce the heavy computational burden when applying wrapper methods.
Chi-squareCNS (UCI)65.00%
Minimum redundancy maximum relevanceGLI (UCI)69.41%
Support vector machine96.99%