Research Article

CFSBFDroid: Android Malware Detection Using CFS + Best First Search-Based Feature Selection

Table 1

Summarized related works from the literature.

S. No.ReferenceAnalysis typeDatasetFeaturesApplied techniquesPerformance claimedLimitation

1.Zhu et al., 2021 [24]Static1065 (B)
1065 (M)
Sensitive API monitoring system eventEnsemble rotation forestAccuracy—88.26%The variation between MCC and accuracy
2Firdaus et al., 2018 [25]Static550 (B)
5555 (M)
Permission rateCodebase feature string, permission directory path, etc.NB, FT, J48
RF, and MLP
Accuracy—95%High FPR and imbalance dataset
3Martinelli et al., 2020 [26]Hybrid9804 (B)
2794 (M)
Static-n-grams dynamic monitoring devices, apps behavior, etc.SVMAccuracy—99.7%Only one classifier is used for evaluation and the imbalance dataset
4S. Alam et al., 2020 [27]Dynamic500 (B)
200 (M)
Network trafficJ48Accuracy—98.4%Only one classifier is used for evaluation and high FPR
5Sugunan et al., 2018 [28]Hybrid200 (B)
150 (M)
Permission,API callsNB, SVM, RF, andJ48Precision—90.5%Small sample size, variation in precision, and recall and F score
6Feng et al., 2018 [10]Dynamic8806 (B) 5213 (M) 5000 (B) 5000 (M)System calls, phone calls, and sent SMSMajority voting stackingAccuracy—96.56%High FPR in system call sample
7Martín et al., 2018 [29]Dynamic4442 (B + M)System calls, SMS sent, cryptographic operation, etc.Bagging, DT, NN, CNN LSTM, RNN, SVM linear, SVM rbf, SVM sigmoid, etc.Accuracy—81.8%Performance of SVM are lowest
8Yerima et al., 2019 [14]Dynamic17444 (B + M)API calls and intentRF, MLP, SMO J48, PART, and NBAccuracy—94.3%Complex procedure
9Yang et al., 2018 [30]Dynamic408 (B) 258 (M)Packet size, sensitive API, antisimulator, etc.SVM, RF, and DTAccuracy—98.54%Imbalance sample size
10Surendran et al., 2020 [31]Hybrid1650 (B) !650 (M)API calls, permissions, and system callsTANBAccuracy—97%Variation in TPR and precision
11Wang et al., 2020 [32]Static61436 (B) 27500 (M)URL and HTTP trafficMultiView SVM, NB KNN, and C4.5Accuracy—98.8%FPR and errors not estimated
12Fang et al., 2020 [33]StaticAMDDex files into RGB imageKNN, SVM RF, and familial classificationF1 score—96%A small number of features considered
13Tao et al., 2017 [34]Hybrid123453 (B)
5560 (M)
Permission, restricted APIs, suspicious API, network address, etc.SVM DREBINAccuracy—94%The variation in precision and recall values, and imbalance dataset
14Garg and Baliyan, 2019 [35]Hybrid85000 (M + B)Permissions, API calls, services, etc.MLP, SVM PART, RINDOR, MaxProb, etc.Accuracy—98.27%High FPR and imbalance dataset
15Maryam et al., 2020 [36]Hybrid2500 (B) 2500 (M)Dex class, hashes, Fda access, permissions, etc.SVM, DT, RF K-star, NB TPOT, etc.F score—97%Variation in precision and recall values
16Jiang et al., 2020 [16]Hybrid4002 (B) 1886 (M)Permission, APIs, intent filters, suspicious calls, system calls, etc.DNN, RBM, DAE, SVM MKL etc.Accuracy—94.7%High false negative and false positive
17Duc et al., 2018 [37]Static123453 (B)
5560 (M)
Requested permission, intent filter, API request, etc.Neural networkAccuracy—92.3%Variation in precision and recall values
18Arshad et al., 2018 [34]Hybrid100 (B) 100 (M)Permission, system calls, etc.RF, DT, SVM, NB, and SAMADroidF score—98%Small size sample
19Alazab et al., 2020 [38]Static14172 (B) 13719 (M)API callsRF, J48, RT KNN, and NBF score—94.30%FPR not estimated