Mobile Information Systems

Research Article

CFSBFDroid: Android Malware Detection Using CFS + Best First Search-Based Feature Selection

Table 1

Summarized related works from the literature.


S. No.	Reference	Analysis type	Dataset	Features	Applied techniques	Performance claimed	Limitation

1.	Zhu et al., 2021 [24]	Static	1065 (B) 1065 (M)	Sensitive API monitoring system event	Ensemble rotation forest	Accuracy—88.26%	The variation between MCC and accuracy
2	Firdaus et al., 2018 [25]	Static	550 (B) 5555 (M)	Permission rateCodebase feature string, permission directory path, etc.	NB, FT, J48 RF, and MLP	Accuracy—95%	High FPR and imbalance dataset
3	Martinelli et al., 2020 [26]	Hybrid	9804 (B) 2794 (M)	Static-n-grams dynamic monitoring devices, apps behavior, etc.	SVM	Accuracy—99.7%	Only one classifier is used for evaluation and the imbalance dataset
4	S. Alam et al., 2020 [27]	Dynamic	500 (B) 200 (M)	Network traffic	J48	Accuracy—98.4%	Only one classifier is used for evaluation and high FPR
5	Sugunan et al., 2018 [28]	Hybrid	200 (B) 150 (M)	Permission,API calls	NB, SVM, RF, andJ48	Precision—90.5%	Small sample size, variation in precision, and recall and F score
6	Feng et al., 2018 [10]	Dynamic	8806 (B) 5213 (M) 5000 (B) 5000 (M)	System calls, phone calls, and sent SMS	Majority voting stacking	Accuracy—96.56%	High FPR in system call sample
7	Martín et al., 2018 [29]	Dynamic	4442 (B + M)	System calls, SMS sent, cryptographic operation, etc.	Bagging, DT, NN, CNN LSTM, RNN, SVM linear, SVM rbf, SVM sigmoid, etc.	Accuracy—81.8%	Performance of SVM are lowest
8	Yerima et al., 2019 [14]	Dynamic	17444 (B + M)	API calls and intent	RF, MLP, SMO J48, PART, and NB	Accuracy—94.3%	Complex procedure
9	Yang et al., 2018 [30]	Dynamic	408 (B) 258 (M)	Packet size, sensitive API, antisimulator, etc.	SVM, RF, and DT	Accuracy—98.54%	Imbalance sample size
10	Surendran et al., 2020 [31]	Hybrid	1650 (B) !650 (M)	API calls, permissions, and system calls	TANB	Accuracy—97%	Variation in TPR and precision
11	Wang et al., 2020 [32]	Static	61436 (B) 27500 (M)	URL and HTTP traffic	MultiView SVM, NB KNN, and C4.5	Accuracy—98.8%	FPR and errors not estimated
12	Fang et al., 2020 [33]	Static	AMD	Dex files into RGB image	KNN, SVM RF, and familial classification	F1 score—96%	A small number of features considered
13	Tao et al., 2017 [34]	Hybrid	123453 (B) 5560 (M)	Permission, restricted APIs, suspicious API, network address, etc.	SVM DREBIN	Accuracy—94%	The variation in precision and recall values, and imbalance dataset
14	Garg and Baliyan, 2019 [35]	Hybrid	85000 (M + B)	Permissions, API calls, services, etc.	MLP, SVM PART, RINDOR, MaxProb, etc.	Accuracy—98.27%	High FPR and imbalance dataset
15	Maryam et al., 2020 [36]	Hybrid	2500 (B) 2500 (M)	Dex class, hashes, Fda access, permissions, etc.	SVM, DT, RF K-star, NB TPOT, etc.	F score—97%	Variation in precision and recall values
16	Jiang et al., 2020 [16]	Hybrid	4002 (B) 1886 (M)	Permission, APIs, intent filters, suspicious calls, system calls, etc.	DNN, RBM, DAE, SVM MKL etc.	Accuracy—94.7%	High false negative and false positive
17	Duc et al., 2018 [37]	Static	123453 (B) 5560 (M)	Requested permission, intent filter, API request, etc.	Neural network	Accuracy—92.3%	Variation in precision and recall values
18	Arshad et al., 2018 [34]	Hybrid	100 (B) 100 (M)	Permission, system calls, etc.	RF, DT, SVM, NB, and SAMADroid	F score—98%	Small size sample
19	Alazab et al., 2020 [38]	Static	14172 (B) 13719 (M)	API calls	RF, J48, RT KNN, and NB	F score—94.30%	FPR not estimated