Research Article

Towards Tax Evasion Detection Using Improved Particle Swarm Optimization Algorithm

Table 1

Advantages and disadvantages of the proposed models for tax evasion detection.

Refs.ModelsToolsDataValidation methodAdvantagesDisadvantages

[27]C5.0Microsoft.NET framework500AccuracyCareful classificationIncrease the depth of the tree
Proportion of positive and negative quick update
[28]Decision tree, logistic regressionStatistical402Prediction efficiency (PE), examination effort (EF), strike rate (SR)LR is easier to implement, interpret, and very efficient to trainOverfitting
[29]Association ruleDB minerAccurate rate, error rateIt is appropriate for low transaction datasetIt needs multiple passes over the dataset
[30]MLP, SVM, LR, HSAStatistical4504Accuracy, sensitivity, specificity, AUROCFast convergence, increase efficiency, increase detection accuracyOverfitting MLP is sensitive to feature scaling
[31]Linear regression, SVMStatisticalAccuracySVM is more effective in high dimensional spaces, proper performance of SVM in memory usageKernel function is not easy long training time for large datasets
[32]Bayesian networksStatistical10028SpeedupA strong and mathematically coherent framework for the analysisThe memory utilization is more
[33]Colored network-based model (CNBM)Framework31,910,000AccuracyAccurate detection of samples segmentation of samples based on the weight of the samplesIn CNBM model, the selection of optimal parameters values is required
[34]Conditional maximum mean discrepancy (CMMD)CodingAccuracyAccurate detection of samplesIncreasing complexity
[35]LR, k-medoidsStatisticalMean absolute deviation (MAD), root mean square error (RMSE)Discover the exact center for the samplesHigh prediction complexity for large datasets
[36]MLP, SVM, logistic regression, random forestStatistical700,000Accuracy, AUCROC, precision, F1-score, recallFast convergence,Overfitting
Increase efficiency,
Increase detection accuracy
[37]Transaction network representationLight-GBM9,422,952True positive, true negative, false negative, false positive, error rate, precision, recall, F-measure, ROCQuick calculation timeWith big dataset, the prediction stage might be slow
Search space is correct.
Inexpensive testing of each instance
[38]Deep learningLight-GBM20444Accuracy, F1-Score, AUCNo need for labeling of dataIncreasing complexity
Suitable for bulk data
Learning of layers based on calculate of individual neurons
[39]BP-ANN, CHAID treeIntelligent miner12458Accuracy rate, error rateLow prediction complexity for large datasetsThe memory utilization is more
[40]ANN-MLPStatistical2,000,000SensitivityFast convergence,MLP may suffer from over fitting
Increase efficiency
[41]ClusteringStatisticalAccuracySimple executionThe memory utilization is more
[42]LR, SVM, KNN, MLP, DT, RFStatisticalAccuracy, precision, recallKNN finds the k-nearest data points in the training setKernel function is not easy
[43]CART, CHAIDStatisticalAccuracy rate, error rateHigh accuracyStuck in the local optimal
Discover important features