Mathematical Problems in Engineering

Research Article

Towards Tax Evasion Detection Using Improved Particle Swarm Optimization Algorithm

Table 1

Advantages and disadvantages of the proposed models for tax evasion detection.


Refs.	Models	Tools	Data	Validation method	Advantages	Disadvantages

[27]	C5.0	Microsoft.NET framework	500	Accuracy	^∗Careful classification	Increase the depth of the tree
[27]	C5.0	Microsoft.NET framework	500	Accuracy	Proportion of positive and negative ^∗quick update	Increase the depth of the tree
[28]	Decision tree, logistic regression	Statistical	402	Prediction efficiency (PE), examination effort (EF), strike rate (SR)	LR is easier to implement, interpret, and very efficient to train	Overfitting
[29]	Association rule	DB miner	—	Accurate rate, error rate	It is appropriate for low transaction dataset	It needs multiple passes over the dataset
[30]	MLP, SVM, LR, HSA	Statistical	4504	Accuracy, sensitivity, specificity, AUROC	Fast convergence, increase efficiency, increase detection accuracy	Overfitting MLP is sensitive to feature scaling
[31]	Linear regression, SVM	Statistical	—	Accuracy	SVM is more effective in high dimensional spaces, proper performance of SVM in memory usage	Kernel function is not easy long training time for large datasets
[32]	Bayesian networks	Statistical	10028	Speedup	A strong and mathematically coherent framework for the analysis	The memory utilization is more
[33]	Colored network-based model (CNBM)	Framework	31,910,000	Accuracy	Accurate detection of samples segmentation of samples based on the weight of the samples	In CNBM model, the selection of optimal parameters values is required
[34]	Conditional maximum mean discrepancy (CMMD)	Coding	—	Accuracy	Accurate detection of samples	Increasing complexity
[35]	LR, k-medoids	Statistical	—	Mean absolute deviation (MAD), root mean square error (RMSE)	Discover the exact center for the samples	High prediction complexity for large datasets
[36]	MLP, SVM, logistic regression, random forest	Statistical	700,000	Accuracy, AUCROC, precision, F1-score, recall	Fast convergence,	Overfitting
					Increase efficiency,
					Increase detection accuracy
[37]	Transaction network representation	Light-GBM	9,422,952	True positive, true negative, false negative, false positive, error rate, precision, recall, F-measure, ROC	Quick calculation time	With big dataset, the prediction stage might be slow
					Search space is correct.
					Inexpensive testing of each instance
[38]	Deep learning	Light-GBM	20444	Accuracy, F1-Score, AUC	No need for labeling of data	Increasing complexity
					Suitable for bulk data
					Learning of layers based on calculate of individual neurons
[39]	BP-ANN, CHAID tree	Intelligent miner	12458	Accuracy rate, error rate	Low prediction complexity for large datasets	The memory utilization is more
[40]	ANN-MLP	Statistical	2,000,000	Sensitivity	Fast convergence,	MLP may suffer from over fitting
[40]	ANN-MLP	Statistical	2,000,000	Sensitivity	Increase efficiency	MLP may suffer from over fitting
[41]	Clustering	Statistical	—	Accuracy	Simple execution	The memory utilization is more
[42]	LR, SVM, KNN, MLP, DT, RF	Statistical	—	Accuracy, precision, recall	KNN finds the k-nearest data points in the training set	Kernel function is not easy
[43]	CART, CHAID	Statistical	—	Accuracy rate, error rate	High accuracy	Stuck in the local optimal
[43]	CART, CHAID	Statistical	—	Accuracy rate, error rate	Discover important features	Stuck in the local optimal