Complexity

Research Article

A Novel Feature Selection Method for Classification of Medical Data Using Filters, Wrappers, and Embedded Approaches

Table 1

Summary of data mining algorithms using feature selection on medical datasets.


Reference	Year	Techniques	Datasets	Accuracy	Limitations

Ghosh et al [23]	2021	Relief, LASSO	Cleveland, long beach, Hungarian, stat log	99.05%	Classification algorithm is dependent on feature selection method.
Alirezanejad et al. [24]	2020	Heuristic methods	Colon, leukemia	92%	Meta heuristics can be applied to remove unnecessary attribute prior to classification
Khan et al. [25]	2019	Gabor filter bank + SVM	DDSM database	92.48%	The proposed system computational cost is much higher and they proposed further research to optimize the computational cost
Lü et al. [26]	2019	RBM + SVM	MNIST handwritten database	81.87%	Due to the determinacy of configuration parameters, the RBM feature extraction is not feasible, which needs to be improved in further research.
Shrinivas D. Desai et al. [27]	2019	BPNN + LR	Cleveland dataset	78.88%	The proposed model cannot be used as a clinical expert, it only complements the decision of clinician for taking better diagnostic decisions
Vijayashree et al. [28]	2018	PSO + SVM	Cleveland heart disease	—	The proposed model can further be improved using ensemble classifiers.
Kalantaria et al. [29]	2018	GA-SVM	California at Irvine (UCI) machine learning repository	84.44%	The proposed system further optimization needs in terms of achieving high performance in detection on medical datasets.
Dwivedi et al. [30]	2018	SVM	Statlog heart disease dataset	90%	The proposed system cannot be used for the predication of disease levels.
Jianguo Chena et al. [31]	2018	Disease diagnosis and treatment recommendation system	PubMed dataset	90%	The security prospective is not been addressed. Feature selection is not considered.
Tayefi, et al. [32]	2017	Decision tree, hs-CRP	UCI dataset	94%	Due to some risk factors in diabetic patients, the proposed model does not consider some key factors to evaluate the system for high performance.
Hoque et al. [33]	2016	Decision tree	UCI dataset	71.2%	To incorporate incremental fuzzy feature selection technique for classification of DDoS attack traffic.
		Random forests		83.12%
		Naïve bayes		28.60%
		kNN		94.50%
		SVM		51.14%
Bennasar et al. [34]	2015	mRMR	Sonar datasets	88%	It disregards the interaction between the features and the classifier, as well as the higher dimensional joint mutual information between more than two features, which sometimes can lead to a suboptimal choice of features.
		JMIM		87%
		NJMIM		86%
EMary et al. [35]	2015	Gray wolf optimization	UCI dataset	—	Improvement can be made using advanced feature selection method.
Veronica et al. [36]	2015	ReliefF	Micro array datasets	90.24%	The developed techniques should be tested on multiplatform, distributed learning, and real-time processing. It provides a new line of research for researchers to work on datasets with numerous increases in dataset use in feature selection.
		Information gain		82.32%
		mRMR		65.23%
		CFS		65.36%
		FCBF		55.21%
		Chi-Squared		81.2%
Zhang et al. [37]	2014	Sequential forward floating selection (SFFS)	Spam based data set	94.07%	Slow processing, used only decision tree for classification
		SBS		95.28%
		MBPSO		91.97%
Verónica et al. [38]	2014	Correlation-based feature selection (CFS)	Brain (UCI)	66.67%	To distribute the microarray data vertically (i.e., by features) in order to reduce the heavy computational burden when applying wrapper methods.
		Chi-square	CNS (UCI)	65.00%
		Minimum redundancy maximum relevance	GLI (UCI)	69.41%
		Support vector machine		96.99%