Abstract

This paper employs machine learning algorithms to detect tax evasion and analyzes tax data. With the development of commercial businesses, traditional algorithms are not appropriate for solving the tax evasion detection problem. Hence, other algorithms with acceptable speed, precision, analysis, and data decisions must be used. In the case of assets and tax assessment, the integration of machine learning models with meta-heuristic algorithms increases accuracy due to optimal parameters. In this paper, intelligent machine learning algorithms are used to solve tax evasion detection. This research uses an improved particle swarm optimization (IPSO) algorithm to improve the multilayer perceptron neural network by finding the optimal weight and improving support vector machine (SVM) classifiers with optimal parameters. The IPSO-MLP and IPSO-SVM models using the IPSO algorithm are used as new models for tax evasion detection. Our proposed system applies the dataset collected from the general administration of tax affairs of West Azerbaijan province of Iran with 1500 samples for the tax evasion detection problem. The evaluations show that the IPSO-MLP model has a higher accuracy rate than the IPSO-SVM model and logistic regression. Moreover, the IPSO-MLP model has higher accuracy than SVM, Naive Bayes, k-nearest neighbor, C5.0 decision tree, and AdaBoost. The accuracy of IPSO-MLP and IPSO-SVM models is 93.68% and 92.24%, respectively.

1. Introduction

Machine learning is one of the ideal ways to reduce the operational costs and costs of business processes. It also accelerates work and provides better services to the customers [1]. Studies have shown that machine learning could reduce costs by 20 to 25% in the banking industry and information technology (IT) operations, infrastructure, and maintenance operations, generate new revenue in industries and services, and increase customer acquisition and retention in various areas. By transforming human processes into intelligent and automated processes, companies can focus their resources on more valuable activities, such as providing better products and services to customers and detecting tax evasion [2]. As one of the most important sources of government revenue, tax currently plays a vital role in the economy of any country [3]. By using various tax policies, governments can use tax tools and adjust their various economic policies to achieve their most important goals such as social justice, proper distribution of income, and elimination of the class gap between different classes of society, stabilization of prices, reduction of unemployment, economic prosperity, and increase of investment [4, 5].

Enforcing the correct tax law is an excellent way to increase government revenue and modernize countries’ tax systems, which can only be achieved through accurate design and proper implementation of intelligent systems, particularly the design and implementation of suitable training systems for training tax organizations [6]. The emergence of tax is perhaps the most important economic event of the last decades of the twentieth century, and its importance is increasing rapidly. However, some people refuse to pay taxes and are looking for ways to evade them [7], which hurts the budget revenues of businesses and governments. Therefore, private and public businesses should focus on construction and manufacturing activities rather than discovering ways to evade taxes [8].

Tax evasion is a global phenomenon whose disruption affects society as a whole. This phenomenon can be described as a deliberate act on tax returns to obtain illegal financial benefits and reduce tax liability. The internal tax code defines tax fraud under the IRC [9]. According to this article, any person who intentionally attempts to evade or defeat any tax imposed by the Internal Revenue Service (IRS) will be recognized guilty, and other penalties shall be provided with them by law. Recent studies have estimated that governments worldwide lose about $ 500 billion annually due to tax evasion.

One of the most critical consequences of tax evasion is economic and social injustice. Tax evasion changes the ability of economic competition in favor of tax evaders. Another consequence of tax evasion is the intensification and spread of this phenomenon due to disruption in the economic security required to expand economic activities and investment [10]. By predicting some ways for tax evasion and finding appropriate solutions, its spread can be primarily prevented, but the most critical factor in preventing tax evasion is the people’s awareness of the importance of tax payment.

Unfortunately, auditing a tax return is a slow and costly process. Due to the lack of software and hardware platforms for receiving tax returns and electronic payments, the electronically classified data of companies could not be obtained in previous years. For this reason, a shift to provide an intelligent software system could not detect tax evasion and design a suitable criterion for it [11]. By implementing various software and hardware infrastructures in the country’s tax affairs organization, various intelligent structures can be designed and developed along with the above systems. Therefore, intelligent prediction models based on machine learning methods [12] to detect tax fraud can be used to increase the precision and efficiency of auditing [13].

Tax agencies use two methods to investigate tax fraud: the auditors’ experience and rule-based systems. A rule-based system, often in a set of if-then series, detects fraud cases [14]. These rules are developed through a complex process in which auditors identify a tax fraud case after investigation and generalize its characteristics, including a set of rules based on tax fraud knowledge. However, these traditional methods have two significant drawbacks. First, they are mainly dependent on past experiences, so they cannot detect new methods of fraud. Second, the subjective judgment of the experts makes the basics of knowledge expensive for providing, maintaining, and updating rule-based systems. Therefore, a new solution to detect tax evasion is the use of machine learning techniques that discover the extraction and generation mechanisms of knowledge from a significant amount of data to detect fraudulent behavior [15].

With the development of machine learning and meta-heuristic algorithms, problem-solving in various fields such as optimization [16, 17], prediction [18], detection [19], classification [20], and clustering [21] is performed with a more accurate process. Meta-heuristic algorithms are widely used in optimization problems due to their high efficiency and various solutions [22]. In particular, the PSO algorithm [23] has shown high efficiency by changing the position and velocity of particles [24, 25]. This paper uses the improved MLP and PSO called IPSO-MLP, SVM and improved PSO called IPSO-SVM, and logistic regression algorithm to detect tax evasion. The IPSO-MLP model uses IPSO to adjust weights, and the IPSO-SVM model employs IPSO to adjust the SVM parameters that play a significant role in the precision of classification. One of the significant challenges in the multilayer artificial neural network is the optimal selection of neural weights that can be solved with meta-heuristic algorithms. Optimal selection of the classification parameters is also essential to increase SVM precision. Meta-heuristic algorithms such as the PSO algorithm can solve problems with reasonable speed and precision by exploring optimal solutions [26]. The models proposed in this paper have not been used in the previous studies on tax evasion; therefore, they are presented as new models for tax evasion detection (TED). Using machine learning algorithms can significantly increase the accuracy and robustness of TED and design detection systems without the need to detect linear relationships. Moreover, the advantage of an improved algorithm is that it can directly extract the optimal response. The main objectives of this paper are as follows:(1)Providing IPSO-MLP model based on the improvement of the MLP weights for tax evasion detection. The IPSO algorithm aims to improve the neurons’ weights in the MLP network, implement the data training steps correctly, and reduce the amount of output error.(2)Providing IPSO-SVM model based on the improvement of the SVM parameters for tax evasion detection. The SVM model highly depends on the value of the initial parameters. If their correct value is determined, it will increase the detection accuracy and accurate separation of instances into different classes.(3)Using machine learning methods for tax evasion detection and comparing their results with the IPSO-MLP model.

The general structure of the paper is organized as follows: Section 2 reviews the previous studies, and Section 3 illustrates the IPSO algorithm and IPSO-based hybrid models. In Section 4, relevant simulations are performed. Finally, Section 5 provides conclusions and future research directions for this work.

2. Review of the Literature

This section reviews previous studies conducted on tax evasion detection. As mentioned earlier, machine learning algorithms play an essential role in tax evasion detection, and most studies have used a combination of machine learning algorithms.

For example, a study in the field presented an architecture for the problem of financial fraud detection by Chinese commercial companies, which included communication with the experts in the field, use of data mining algorithms, design instructions for data mining systems, and integration of knowledge of the experts in the field. The proposed architecture used the C5.0 decision tree. The dataset contained samples of 500 commercial companies during one year, and each sample had 100 characteristics. After classification, the training dataset was divided into two parts, including 460 positive samples and 40 negative samples. The implementation precision of the C5.0 decision tree was 85–90% [27].

Another study implemented eight models based on different combinations of the decision tree and logistic regression (LR) for value-added tax (VAT) in India from 2003 to 2004. The samples included 402 sales agents. The results indicated that all the models developed through data mining were better than the random selection method [28].

Moreover, researchers used association rules for Taiwan data to design an evasion detection model from VAT from 2003 to 2004. They evaluated data on two different datasets with 1934 and 1543 samples and employed eight different rules to detect fraudulent samples. The precision of the association rules was >80. According to the results, the designed model increased the tax evasion detection, and therefore, it could be used to effectively reduce or minimize losses due to VAT evasion [29].

In addition, scholars [30] used an intelligent system that combined an MLP-ANN, support vector machine (SVM), and logistic regression (LR) with a harmony search algorithm (HS) to detect tax evasion of companies taken from the Iranian National Tax Administration (INTA). Learning rate is one of the essential factors in MLP, which ranges between 0 and 1. Moreover, the optimal number of iterations was optimized to prevent network over-learning and the increase in network error. By increasing the number of iterations, the amount of error was reduced, but increasing the number of iterations should be systematic to reduce the amount of network error and prevent the training time. HSA was used to find the parameters of the SVM and MLP classification models. This model was tested using a 10-fold iterative validation structure with datasets, including 2451 and 2053 test samples from a two-year tax return and 1118 and 906 samples as data from the food and textile sectors. Even if the data contained actual values, network training would result in high error rates if the data were not normalized. Data normalization was performed according to the following equation:where UN is the financial variable before normalization, μUN is the UN average, σUN represents the standard deviation (SD), and N is the normalized financial variable.

The results of experimental data showed that the MLP model in combination with HSA had better detection than other combinations so its precision for food and textile datasets was 90.07% and 82.45%, respectively. Moreover, sensitivity was 85.84% and 84.85% for food and textile datasets, respectively, and specificity was 90.34% and 82.26% for food and textile datasets.

Furthermore, researchers proposed a model based on linear regression and SVM to detect high-risk taxpayers. Therefore, they collected tax data from 2010 to 2015 in the INTA. The steps of linear regression were as follows: formulating the regression formula: , selecting the latest data, obtaining tax income for taxpayers, calculating the average taxable income of taxpayers, and calculating the goodness-of-fit and regression model for taxpayers. People who had a moderate amount of high regression prediction for different years were considered high risk. Tax experts’ output accuracy test indicated that high precision could be obtained by combining the SVM and linear regression models [31].

Due to the recent development and large volume of data stored in tax systems, a tool is needed to process the stored data and detect fraudsters based on the information obtained from it. In this regard, some scholars used the parallel Bayesian network to detect forgers [32]. The Bayesian network is a directional graph in which nodes represent variables . The dataset used in their study included 10028 records. The results showed that the fraud of taxpayers with a complementary sheet was about 57.9% [32].

A colored network-based model (CNBM) was proposed to describe economic behaviors, social relationships, and taxpayer transactions and establish an interaction network [33]. China-based National Tax Information System (NTICS) is involved in a large volume of transactions and data. For example, there are more than 31,910,000 taxpayers and 48,000 tax offices across China. The first stage aimed to detect suspicious groups from a heterogeneous information network based on the CNBM to detect suspicious business relationships. Suspicious groups were extracted in the first group, called the suspicious mining group (MSG). The second stage, identifying tax evasion (ITE), performs all transactions related to suspicious business relationships to detect tax evasion in a set of suspicious groups using traditional methods. To evaluate the effectiveness of the CNBM model in the MSG phase, a simulated network based on the business relationships was implemented based on the graph theory and actual data-based experiments for all nodes. Experimental results indicated that the CNBM model could improve efficiency in the possible tax evasion detection in the MSG phase [33].

A deep learning network-based model for tax evasion detection was also proposed, in which some features were extracted based on the maximum conditional difference (CMMD) for the conditional probability distribution (CPD) [34]. In the deep learning network, different layers and distribution adapters were used to identify suspicious samples. According to the findings, the deep network model had better detection precision than the conventional artificial neural network.

Another study presented a regression model using commercial primary tax information and the rate of tax evasion by suspicious commercial sellers [35]. The sellers were categorized into different seller groups using Benford’s law, and the type of classification was determined after implementing the k-medoids clustering algorithm on a set of sellers. In the k-medoids algorithm, before calculating the distance of other data from each cluster center, the K point was randomly selected from n data as the cluster’s center with the specified center as the median. Then, each point was assigned to the nearest cluster. This iterative method for changing cluster centers was continued to achieve the best clustering. Auditors use Benford’s law as a simple and effective tool to detect fraud in fraudulent audit methods. This law includes a set of statistical principles to determine the extent of dispersion of numbers used in specific rows of digits in the sample set. Equation (2) was used to give suspicious points to clusters [35].where m is the total number of edges (or transactions) in clusters c, and W is the weight of the edges. φ(c) is the mean value of the absolute deviation from Benford’s law for W.

The dataset used in this study was provided by the Commercial Tax Office of Telangana state, India. The results of this study helped tax enforcement agencies in preventing tax evasion.

Another study used random forest, MLP-ANN, SVM, and logistic regression algorithms to evaluate risk and detect tax evasion [36]. Therefore, an integrated social network of taxpayers was modeled. In an economic transaction , node u is the seller, and node is the buyer. The taxpayer social network, built from data from the Tuscany region of Italy in 2014, included about 700,000 nodes and 1,800,000 edges. The random forest model had the best results in terms of accuracy (74.29), AUCROC (74.29), precision (75.42), and F1 (76.73), while the best value for the recall criterion belonged to the MLP model (75.63).

A graph-based network model called TED-TNR used the weighted adjacency matrix for tax evasion detection [37]. This model used three different vectors A, S, and X. A is the matrix of taxpayers’ traits. S is the similarity matrix of the taxpayers’ features and can be calculated by measuring similarities such as cosine similarity. X contains the final values for the taxpayers based on a value obtained from A and S. Therefore, 9,422,952 transaction samples were evaluated in the wholesale and retail industrial groups. The transaction network was a directional weighting network that included 323,587 nodes and 1,430,821 edges. Indicators such as company size, registered capital, and investment ratio were the main outlines of the trading network. The results demonstrated that the detection precision of the TED-TNR model was higher than conventional and ANN models [37].

In addition, researchers proposed a deep learning model called the transferable tax evasion detection method based on positive and unlabeled learning (TTED-PU) to identify the suspected tax evasion samples [38]. They used a transfer learning method based on the semiregulatory method using positive and negative samples to predict untested samples. In this model, the gradient reduction method was applied to find the weight of neurons from derivation rules. Evaluation on 20,444 samples showed that the TTED-PU model had a lower error.

Moreover, a model was suggested based on the error back-propagation artificial neural network and the CHAID decision tree for tax evasion detection [39]. Hence, BP-ANN injected tax samples into the algorithm, and differences in training data were detected by increasing and decreasing weights and deviations. One of the critical goals in ANN was to find the appropriate weight for different layers and actually to estimate the ANN parameters. The BP algorithm is a method for calculating weights that can be calculated from two forward and backward paths, and this forward and backward path is iterated to achieve the best estimate of the network parameters and is considered a training process. In the CHAID tree, all values of the characteristics of the target variable were evaluated using the chi-squared statistical criterion. In this algorithm, the statistically similar values are related to each other according to the target variable. Evaluation of 12,458 different samples revealed that the percentage of accuracy of the CHAID decision tree was higher than that of BP-ANN [39].

Furthermore, scholars proposed a model based on MLP-ANN to help tax fraud detection on personal income tax returns (IRPF, in Spanish) [40]. In this network, neurons of each layer are related to the neurons of the previous layer, but this relationship is not necessarily under the same conditions but with different weights. The MLP-ANN output was defined according to the following equation:where is the node value i of the previous layer, is the bias of the node j in the current layer, is the connection weight of and , N is the number of nodes in the previous layer, and f is the activation function in the current layer. In the learning phase, 70% of the data were used for the training phase, and 30% for the testing phase. The dataset included 2,000,000 samples, of which 1,350,974 were for the training phase and the rest for the testing phase. The precision of MLP-ANN was >80%.

Researchers also analyzed the tax return data of a group of commercial sellers in Telangana (India) based on graph clustering [41]. In graph clustering, the top-down method is used, and each sample is assigned to a cluster closer to the samples. The closest Euclidean distance for clustering was used to identify similar samples. The results showed that clustering affected the tax samples, and suspicious samples were detected by clustering [41]. Another study used the machine learning classification approach to detect fraudulent samples of government-linked companies in Malaysia [42]. Therefore, researchers applied LR, SVM, KNN, MLP, DT, and random forest models to detect and classify the samples. The 24-feature dataset included fraudulent companies from 2010 to 2016. The findings indicated that the detection precision of the random forest model and DT was higher than in other models [42].

Another research aimed to identify companies that experienced fraudulent financial statements between 2002 and 2013 [43]. Hence, two regression tree (CART) and Chi-squared automatic interaction detector (CHAID) algorithms were used to select the main variables of fraud detection. The second stage combined CART, CHAID, deep belief network, support vector machine, and artificial neural network to create models for detecting fraudulent financial statements. According to the results, the detection performance of the CHAID-CART model with 87.97% precision was better than other models. Table 1 presents the advantages and disadvantages of the proposed models for tax evasion detection. Each model has some advantages and disadvantages that lead to success or inefficiency. According to the analysis of the literature review, it was concluded that artificial neural networks had better detection and minor error due to the pattern recognition capability, optimal relationship between input and output data, less sensitivity to errors in input data and training of neurons, parallel processing, fewer input data, and faster and easier verification process in detecting and predicting the relationship between tax evasion factors.

In ANN, the error and trial method is mainly used to determine the optimal number of hidden layers, and therefore, a structure with the least number of hidden layers must be selected with an acceptable degree of error. The fewer hidden layers of a network take less time to train a network.

Moreover, the number of neurons in the hidden layers has a significant effect on ANN function. The use of a small number of neurons leads to inaccurate learning of most samples by the ANN. On the other hand, the presence of many neurons results in the preservation of patterns and thus prevents the neural network from learning to detect their basic features. According to the analysis, it was concluded that issues such as the number of hidden layers and number of neurons should be considered in ANN.

Furthermore, precise feasibility and ease of implementation are determinants for choosing the appropriate model for tax evasion detection (TED). As mentioned in the extensive previous literature, the use of machine learning models such as SVM and MLP is recommended due to the increased precision compared to previous models such as decision trees (DTs). According to this and other advantages, including flexibility, efficiency, and precision of instance detection, this study uses machine learning models for TED.

3. Proposed Models

This section explains the IPSO-MLP, IPSO-SVM, and LR models. The IPSO-MLP model uses IPSO to find the weight of the MLP network, and the IPSO-SVM model employs IPSO to find the SVM parameters. Figure 1 depicts a block diagram of hybrid models.

Data extraction and cleaning: first, the data were extracted from the tax administration in an Excel file. Then, unsuitable and scattered data were identified and deleted.

Calculation of financial variables: the dataset includes dependent and independent variables. The dependent variable is a binary variable of 0–1, so that 1 represents the presence of tax evasion, and 0 shows the absence of tax evasion. Independent variables, which are the most important, are classified according to personal taxes. The measurement of a class variable is defined based on the following equation [44]:where C is the percentage difference between the included tax expressed profit difference and the profit included deterministic tax in year t. The ACCIN and TAXIN parameters are the included tax expressed profit difference at the end of the fiscal year, and the profit included deterministic tax at the end of the fiscal year, respectively. If there is a 15% difference between the profit included deterministic tax and the included tax expressed profit difference of the business unit, then it will be considered as tax evasion of the business unit.

Sampling: data samples are collected from the tax database, and records that are most likely to be involved in tax evasion are selected.

Normalization: data normalization is performed for all proposed models. In the proposed models, the samples are first to read from the dataset file, and then, the preprocessing operation is performed. Standardization was performed in the preprocessing stage to normalize the data in a specific range. In general, data in different change ranges cannot positively affect each other or the model. Therefore, the data should be in an equal range (e.g., they should be 0 to 1). The normalization operation on the data is defined based on the following equation:where the parameter is the normalized values, represents the actual values, is the average of the actual values, is the maximum actual values, and refers to the minimum actual values.

The 10-fold cross-validation method is used to perform the training and test process. Therefore, each dataset is divided into ten parts, and nine parts are used in each implementation as a training group and one part as a test group.

Optimization algorithm: the main goal in this step is to maximize the precision of the classification. The use of the IPSO algorithm in the MLP network training and optimization of SVM parameters accelerates the operation and increases the precision of the results.

3.1. IPSO Algorithm

The particle swarm optimization (PSO) algorithm is a social search algorithm inspired by the social behavior of birds when searching for food [23]. There are several particles in this algorithm that seek to optimize an optimization problem in a search space. Each particle calculates the goodness-of-fit function in its current position. Then, it selects an optimal direction for movement by comparing information about its current position and the best position it has ever been in and information about the best particles in the group. Hence, all the particles choose the movement direction, and one step of the algorithm is completed after each movement.

In the PSO algorithm, the position of particle I is defined as . Particle velocity is also defined as . The goodness-of-fit function for particles in the population is evaluated and compared with the value of the previous best result of the same particle and the best particle in the whole population. In the PSO algorithm, the particles move to the optimal regions under the influence of their experience and knowledge (Pbest) and the knowledge of their neighboring particles to achieve the best solutions. After finding these two optimal values, the particle moves according to (6) and (7) by updating its speed and position.where represents the population size, and is the best response found by the particle i. is the best response in the whole group. Parameters c1 and c2 are learning parameters whose values can be defined in the range 0–2. The functions and are two random numbers with uniform probability in the range 0–1. Changes in are in the range so that Vmax is the maximum speed allowed for the particles. The inertia coefficient ω is used to control the search balance of the algorithm between exploration and exploitation. The population size matrix is defined according to the following equation:

An essential factor in the PSO algorithm is the conversion of continuous mode to discrete mode. In the discrete state, the movement of the particles is limited to 0 and 1. The parameter , whose value is mapped to the range 0–1, determines the value of x (position), which means the probability that x = 1. Implementing the IPSO model, the particle velocity is mapped to a value between zero and one using the bounded sigmoid function according to (9). Finally, particle i in the d-dimensional dimension is updated according to the following equation:

According to (11) and (12), the learning factors are improved in this paper to encourage the movement of particles in the whole search space and strengthen the convergence rate.where , , , and are the initial constants. is the maximum iteration. and values are determined based on the initial and final values of the learning coefficients c1 and c2, respectively. Many algorithms rely on fixed values to generate and search for new solutions. These values play a crucial role in the generation of optimal solutions. If the value is constant and moves in the problem space, the search may reach a final solution within a reasonable time, which may lead to the ignorance of good solutions in the vicinity of local points. In IPSO, the balance between global particle search and local search depends mainly on the learning coefficients. If the amount of learning coefficients is large, the particles are updated in a large area, which develops the global exploration of the algorithm. In contrast, local search plays an essential role in the optimization process if the learning factor is negligible. The learning factor is updated during optimization according to the number of iterations to prevent the early convergence of particles and accelerate exploration.

3.2. IPSO-MLP Model

The multilayer artificial neural network mainly consists of three layers (input, middle, and output) [45]. The first layer receives n properties of input , which are processed by subsequent layers. The input layer only receives samples from the dataset and acts as an independent variable. Therefore, the number of input layer neurons is determined based on the number of independent variables. Hidden layers perform intermediate calculations and enable the output layer to predict the optimal response. The output layer acts as a dependent variable, and the number of neurons depends on the number of dependent variables. Each layer consists of nodes connected to all nodes of the next layer, except the input layer, whose nodes contain input properties.

This section explains the combination steps of the MLP artificial neural network with the IPSO algorithm as flowcharts and algorithms. As mentioned earlier, we use the IPSO algorithm to detect tax evasion for increasing the precision, accuracy, and training speed of the MLP artificial neural network. The purpose of training MLP artificial neural networks is to find the size of the weights to minimize the training data error. Hence, MLP artificial neural network training can be considered an optimization problem to optimize the weight coefficients of neurons to achieve the minimum training error. The random production of the initial particle population takes place in the IPSO algorithm. Random production of the initial population is simply the random determination of the initial location of the particles with a uniform distribution in the search space. The position of a particle in the IPSO algorithm is represented by x, which contains n elements such as . The next step is to select the number of initial particles. Empirically, the initial population size of 30 to 50 particles is an ideal choice and works well for almost all engineering problems. Then, the objective function must be evaluated. Each particle representing the solution to the problem under study must be evaluated at this stage. The fitness value of each particle is calculated to minimize the error. In the next step, the best position for each particle is determined, and then, the best position among all particles is determined. All particles’ position and capability vector are updated, and the particles are directed to the new position. Figure 2 shows the IPSO-MLP model flowchart.

According to Figure 2, after entering the data into the IPSO-MLP model, the data are prepared for training using cleaning and normalization. The solution lengths of weights and biases are designed for the MLP network based on the number of weights and the number of biases according to (13). The IPSO-MLP model uses 80% of the data for training and 20% for testing.where n is the number of input nodes, is the weight from node i to node j, and is bias.

In general, traditional methods such as the back-propagation algorithm and other gradient methods are used to train artificial neural networks. If the function is nonlinear and complex in these methods, they cause weakness and inefficiency in detection precision. In the back-propagation algorithm, a newly calculated output value is compared with the actual value each step, and the weights and biases of the network are corrected according to obtained error so that at the end of each iteration, the size of the resulting error is less than the value obtained in the previous iteration. This minimization is based on the movement of the gradient vector of the network error squares function, which is obtained by deriving a chain from the error function to each network parameter. Although the back-propagation algorithm is widely used to train artificial neural networks, using this method leads to problems in some cases. These barriers include slow convergence in the training process and early convergence in local minimums. Figure 3 depicts the pseudocode of the IPSO-MLP model.

There are several algorithms for training the multilayer artificial neural network. This paper uses the improved PSO algorithm. In an artificial neural network, the initial values of the weights are of particular importance, and all the values of the weights are selected randomly before the training begins. MLP training aims to achieve the highest classification, approximation, or prediction precision for training and experimental samples. Assuming that the number of input nodes is equal to (N), the number of hidden nodes is equal to (H), and the number of output nodes is (O), then the output of the hidden node i is defined according to (14). The sigmoid activation function in the hidden layer is used in this paper. The sigmoid function maps the value of neurons from 0 to 1 to normalize the total weight of the neurons.where n is the number of input nodes, is the weight from node i in the input layer to node j in the hidden layer, is the bias (threshold) of the hidden node j, and is input i. After calculating the output of the hidden nodes, the final output can be defined according to the following equation:where is the weight from the hidden node j to the output node k, and is the bias (threshold) of the output node k. It should be noted that MSE is used based on (16) to determine the optimal values for weights and biases to reduce the error in the training and optimization process.where is the actual output of the input sample i, is the predicted output of the input sample i, and n is the number of samples.

Network output is calculated each step, and the weights are corrected according to their difference with the desired output to minimize the error value. MSE aims to minimize the discrepancy between the results of the hybrid model and the actual data.

3.3. IPSO-SVM Model

The support vector machine (SVM) is a nonstatistical binary classifier based on regulatory classifications for data analysis [46]. The goal of the support vector machine is to maximize the margin of the hyperplane, which maximizes the separation between samples. The training points near the separating hyperplane are called support vectors, which are used to identify the margin between classes. This algorithm uses an optimal linear decision margin to separate classes. If the training points are defined as , the input vector is defined as , and the class value is so that the data can be separated nonlinearly, and the decision rules are defined by the optimal hyperplane for binary decision classes to separate the samples according to the following equation:where Y is the output of the equation, and the value of the training sample class and are the parameters and b to determine the hyperplane.

The function is a kernel function that generates inner multiplication to produce machines with different nonlinear surfaces in the data space. Therefore, the concept of classifier margin is used to select the best separating hyperplane in the SVM. If the norm of the vector is expressed with , then d is the margin defined for the distance between two classes according to the following equation:

The SVM algorithm is a method to separate and identify two classes by a separating hyperplane defined on the training data. In the SVM algorithm, the decision margin must be able to classify all the samples correctly. Such a decision margin with the ability to classify all samples correctly is defined by solving the finite optimization problem according to the following equation:where is the weight vector or normal optimal hyperplane vector, and b is the oblique vector representing the distance from the hyperplane to the origin. C is the margin adjustment parameter to balance maximizing the margin and minimizing the classification error, which is always greater than zero. The variable  > 0 is considered to be the interference between the training data. According to (20), the radial base kernel function transmits data to a space with higher dimensions. The parameter is the Euclidean distance between two feature vectors, and the user defines as the kernel width.

The input parameters for SVM are adjusted using the improved PSO algorithm in this paper. Proper selection of the parameters C and in the support vector machine algorithm is of high importance because they increase the precision of detection and prediction of the SVM. In particular, parameter optimization is an essential step in SVM classification. Figure 4 shows the IPSO-SVM model flowchart.

In the solution vectors, the values and are searched in the range [C1C2] and ]. The position of particles in the problem space is changed according to personal experience and the experience of the best neighbor. Parameters C and of the SVM classification are defined by mapping and according to the following equation:

The population in the iteration t is defined as so that each particle is defined as where NP is the population size and D is the dimensions of each particle. In the hybrid model, each particle is defined as . The goodness-of-fit function is evaluated based on the accuracy criterion. The best accuracy value with maximum iterations is displayed in the output.

3.4. Logistic Regression Algorithm

Logistic regression [47] is a particular form of linear regression in which the response variable is discrete. Like linear regression, there are one or more independent variables in this type of regression, based on which the probability of each of the two-state variable levels of the dependent variable can be calculated. The logistic regression model for the independent variables p is defined according to the following equation:where Y is the probability that the dependent variable is equal to one, is the estimated coefficient of the variable in the model by logistic regression, and is the independent variable in the model. Using the estimated features, the probability of presence for each response variable is defined according to (23) so that P(Y = 1) is the probability of a response variable. The margin between the presence and absence of the response variable is 0.5, which classifies the response variable into zero or one class. If the value of the response margin is closer to one, it represents the presence and probability of more positivity.

The advantage of logistic regression over other regressions that obtain model coefficients with total squares is that a linear relationship between independent and dependent variables should not exist. Moreover, it does not require a normal distribution between variables, assumes that the variables have equal statistical variances, and generally includes fewer hypotheses.

3.5. Computational Complexity

This section explains the computational complexity of IPSO-MLP. The complexity of IPSO mainly depends on factors, including population size, the maximum number of iterations, the number of variables, and the number of iteration loops. The temporal complexity of MLP is equal to O (n × m), where n is the number of neurons, and m is the number of layers. In addition, the computational complexity of PSO equals (I × P × D), where I is the maximum iteration, P represents the population size, and D is the particle size. The computational complexity of IPSO in the learning factor stage equals O(N). Therefore, the overall complexity of IPSO-MLP equals (I × (n × m + N + P × D)). In general, the complexity of the SVM algorithm is O(n2) where n is the number of training instances; hence, the overall complexity of IPSO-MLP is equal to (I × (n2 + N + P × D)).

3.6. Evaluation Criteria

Precision, recall, F1, and accuracy criteria are widely used for classification. Evaluations made by the proposed models for a customer’s validity may be bad validity (positive) or good validity (negative). Therefore, the following four situations may occur for a customer:(a)The prediction result is tax evasion, but the customer includes tax evasion based on the empirical classification called true positive(b)The prediction result is tax evasion, but the customer includes the absence of tax evasion based on the empirical classification, called false positive(c)The result of the prediction is the absence of tax evasion, but the customer includes the absence of tax evasion classification based on the empirical classification, which is called true negative(d)The result of the prediction is the absence of tax evasion, but the customer includes tax evasion based on the empirical classification, which is called false negativewhere TP is the number of correct records of positive cases (tax evasion), FP represents the number of incorrect records of positive cases, FN is the number of incorrect records of negative cases, and TN refers to the number of correct records of negative cases (no tax evasion).

4. Evaluation and Analysis

This section evaluates three proposed methods (IPSO-MLP, IPSO-SVM, and LR) based on machine learning for tax evasion detection. As mentioned earlier, this paper uses the dataset of the General Administration of Tax Affairs of West Azerbaijan Province in 2019 with different groups of 1500 samples and nine features (gross taxable income, including expressive tax net income, related tax, tax exemptions, tax discount, tax payable, payments made, taxable balance, and class feature). The models are implemented in MATLAB 2017b. One of the most critical parts of determining the optimal structure of a multilayer artificial neural network is to determine the number of hidden layers and the number of neurons in each hidden layer to achieve the minimum error. Table 2 presents the initial values of the parameters to run the models.

4.1. Applied Study

This section evaluates the models based on the classification dataset. The models’ performance has been tested using seven reference datasets. These datasets have been taken from the machine learning repository (UCI) (https://archive.ics.uci.edu/ml/datasets.php), and Table 3 reports their specifications. These datasets have been chosen because they have been mainly used to prove the experimental performance of algorithms. This paper has used the classification dataset to show the efficiency of the IPSO-MLP, IPSO-SVM, and LR models to determine the percentage accuracy. According to the results of Table 3, the accuracy percentage of the IPSO-MLP model on Heart Cleveland is 87.15. However, the accuracy percentage of IPSO-SVM and LR models is lower than that of the IPSO-MLP model.

The accuracy of the IPSO-MLP model on cancer is 98.56. The MLP performance depends on the choice of various parameters such as the initial weight and size of hidden nodes. Optimal adjustment of the parameters of an artificial neural network, including the selection of the appropriate initial weights, leads to solving slow and early convergence problems of the training process. According to this study, it can be concluded that selecting optimal weights and the number of hidden nodes helps MLP performance to increase the accuracy of classification detection.

4.2. Evaluation of Models

Table 4 and Figure 5 show the results of the models based on different criteria. To evaluate the detection precision of the IPSO-MLP, IPSO-SVM models, SVM, KNN, C5.0, NB, MLP, and AdaBoost models are used for comparison. According to the results, the accuracy of the IPSO-MLP model is 93.68, which is higher than the other models. Moreover, the SVM and MLP models have higher detection precision than KNN, C5.0, NB, and LR models. The precision and recall percentages in the IPSO-MLP model are 93.25 and 93.78, respectively. The precision and recall percentages in the IPSO-SVM model are 92.64 and 92.75, respectively. The precision and recall percentages in the LR model are 91.80 and 82.34, respectively. The precision, recall, and accuracy percentages in the MLP model are 89.82, 90.45, and 91.33, respectively.

According to Figure 5, hybrid models have a higher percentage of detection precision. The IPSO-MLP and IPSO-SVM models exhibit higher efficiency and precision using IPSO. The strength and efficiency of the MLP model lie in its internal structure. If the internal structure of the MLP is appropriately trained, the MLP output will be high precision.

Table 5 reports the results of the IPSO-MLP model based on the number of different layers. According to the results, it is clear that the IPSO-MLP model with three layers has a higher percentage of accuracy. Different hidden layers are used in MLP, and the optimal number is determined to minimize errors. The process starts with a small number of layers, and additional layers continue until increasing the layers does not improve the error. When there are 5 and 7 layers, the accuracy percentage is 91.47 and 91.19, respectively. In contrast, the 3-layer IPSO-MLP model has a lower MSE error rate and higher detection precision. The MSE value is the mean value of the best combination of the connection weights and bias values.

Figure 6 shows the run diagram of the IPSO-MLP model. According to the Figure, the horizontal axis represents the epochs, and the vertical axis represents the MSE value. The run of IPSO-MLP is shown based on the training, validation, and testing stages. It can be observed that the amount of MSEs of training, validation, and testing is gradually decreased. The error value in Figure 6(b) is lower than in Figure 6(a). According to the results, it can be concluded that the more the number of epochs, the amount of error will be less.

Figure 7 depicts a comparison graph of the IPSO-MLP and IPSO-SVM models based on different runs. As shown in the figure, it is clear that the IPSO-MLP model has a higher percentage of accuracy in all runs.

Figure 8 shows a comparison graph of the IPSO-MLP and IPSO-SVM models based on the number of iterations of the IPSO algorithm. As shown in the figure, it is clear that the IPSO-MLP model has a higher percentage of accuracy in all iterations. The accuracy of the IPSO-MLP and IPSO-SVM model with 100 iterations is 90.47 and 89.75, respectively. The accuracy percentage with 300 iterations is 92.35 and 91.48, respectively. The IPSO algorithm with a reinforcement learning rate prevents local optimization and early convergence in the PSO algorithm. With increasing iteration time, the global search capability increases by IPSO at high iterations and thus improves the convergence speed.

This paper examines different machine learning models and confirms that IPSO-MLP is a good model for tax evasion detection. It should be noted that the IPSO-SVM model performs better than models such as SVM, KNN, NB, and C5.0. In general, it can be concluded that the combination of machine learning algorithms increases detection precision.

Table 6 compares IPSO with the genetic algorithm (GA), artificial bee colony (ABC) [48], firefly algorithm (FA) [49], and imperialist competitive algorithm (ICA) [50]. The parameters included in the algorithms were set as follows: the maximum number of iterations is 500, and the population size is 50. Each algorithm was run ten times independently. Table 6 presents the average of the results obtained by each algorithm. According to the table, it is clear that the IPSO algorithm has a higher percentage of accuracy than that of GA, ABC, FA, and ICA. The accuracy percentage of IPSO-MLP, GA-MLP, ABC-MLP, FA-MLP, and ICA-MLP was 93.68, 93.11, 93.26, 91.94, and 93.27, respectively. In addition, the accuracy percentage of IPSO-SVM, GA-SVM, ABC-SVM, FA-SVM, and ICA-SVM was 92.24, 91.43, 91.74, 92.38, and 91.53, respectively. IPSO was used to extract the optimal MLP and SVM parameters for changing the learning coefficients. MLP is a predictive model for establishing a mapping relationship between input and output instances.

According to the analyses, the IPSO-MLP model showed the highest classification among the compared models when the performance of tax evasion detection models was evaluated. It was found that the LR model classified data inefficiently with a minimum accuracy (67%). The higher accuracy belonged to the FA-SVM model, which is superior to previous models. The ABC-MLP and ICA-MLP models showed the same accuracy values as the IPSO-MLP model. However, the IPSO-MLP model revealed better detection due to improved learning factors.

4.3. Evaluate Statistical Analyses Such as ANOVA

Analysis of variance (ANOVA) is not very efficient and accurate in this paper. For example, the LR method is used in this paper, and ANOVA was implemented to some extent, though the precision of ANOVA detection cannot be compared to machine learning algorithms.

5. Conclusion and Further Research

Tax evasion is a main problem of the tax system in most countries of the world. Due to the importance of tax evasion, it is essential to use methods that can identify tax evasion cases for the tax administration. Since machine learning algorithms have predictive and classification features, the decision-making process in financial issues can be facilitated. Moreover, neural networks provide low-cost algorithmic solutions and facilitate analysis because they do not require different statistical assumptions. This paper investigates the efficiency and ability of machine learning methods in the field of tax evasion detection. Therefore, this system is implemented by the tax administration dataset using the 10-fold cross-validation method and an iterative training, testing, and validation method. The paper results on 1500 tax samples indicate that tax evasion may be detected using machine learning methods. The accuracy of the IPSO-MLP model is over 93%. In addition, the IPSO-MLP error value is 0.033995. Evaluation of the hidden layer active neurons and training of the artificial neural network model demonstrate that 30 iteration cycles with ten hidden layer neurons as an optimal artificial neural network are suitable for tax evasion detection.

Furthermore, the IPSO-SVM, SVM, and MLP models perform well. Therefore, future research should investigate the importance of population initialization in the PSO algorithm for convergence rate and the quality of the final solution. Moreover, opposition-based learning can be used to increase diversity in the initial population. Consequently, the whale and gray wolf optimization algorithms may be used in the exploration phase of the PSO algorithm, and each of them can be tested separately.

Data Availability

Our proposed method applies the dataset collected from the general administration of tax affairs of the West Azerbaijan province of Iran.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This manuscript is part of the research in the Ph.D. thesis of Houri Mojahedi and my cooperation with Dr. Amin Babazadeh Sangar and Dr. Mohammad Masdari. This research is self-funding.