Abstract

To improve the ability of market to avoid and prevent credit risk and strengthen the awareness of market risk early warning, SMOTE is used to process the unbalanced sample, and fruit fly optimization algorithm (FOA) is utilized to optimize the parameters of support vector machine (SVM), and thus an improved SVM market risk early warning model is proposed. The simulation results show that the proposed model has excellent stability and generalization ability, and it can predict market credit risk accurately. Compared with the prediction model based on FOA-SMOTE-BP and FOA-SMOTE-Logit, the proposed model performs better on the indicators of G value, F value, and AUC value, which provides a reference for market credit risk prediction.

1. Introduction

Market risk early warning is an important measure to prevent market risk and unknown loss and improve market normalization. In recent years, with the development of intelligent technology, deep learning has been widely used in various fields, including market risk warning. So far, the relevant researchers achieved market risk early warning through the use of deep learning. On the basis of in-depth study of rough set theory (RST), Guan et al. proposed a financial operation risk early warning model based on BP neural network, which effectively realizes the prediction of financial operation risk and profit risk of family farm [1]. In the cross-border environment, scholars put forward the marginal expected gap, delta conditional value at risk, and conditional capital gap to measure the system risk. The feature pair method based on bilateral balance sheet data is different from the paradoxical risk measurement method based on market price. Thus, a systemic risk early warning method based on the network spectral feature pair method analyzing the core global banking system is proposed. The method provides risk early warning for the unstable financial markets based on turning points similar to R numbers in popular models [2]. Figini et al. improved the sample performance of parametric models and nonparametric models in credit risk estimation, so as to propose a multivariable outlier detection technology based on local outliers, which can support financial institutions to make decisions and avoid falling into corporate credit risks [3]. In view of the credit risk assessment of the Internet finance industry, Yang and Yuan applied RBF network to analyze the statistical data of online peer-to-peer lending platform and evaluated the credit risk of the platform, and thus a new early warning method of RBF neural network model is proposed, which can reasonably predict the credit risk status of the industry development [4]. Based on the fuzzy theory and related theories of financial risk early warning management, Ding proposed the fuzzy comprehensive evaluation method, which realizes more accurate early warning and assessment of potential and obvious risks of financial enterprises. In addition, the safety of financial enterprise management is greatly improved, and the losses caused by various risks are reduced [5]. Based on BP neural network, Li constructed a risk assessment model of knowledge transfer in transforming enterprises to realize knowledge management risk warning [6]. Dong adopted the improved K-means algorithm of quantum evolution to divide the risk warning interval by combining the given initial value and the value at risk measured by well-known Chinese online financial companies [7, 8]. Zhang and Chen used the autoregressive conditional Fréchet (ACF) model to predict the tail risk of the capital market, so as to identify major crisis sources [9]. Ouyang et al. applied the deep learning algorithm to the early warning of market risk. The results show that the algorithm has high accuracy compared with the traditional BP and others [10].

Among them, SVM algorithm is widely used in the field of classification because of its nonlinear and small sample advantages, but the parameter optimization of SVM is a research hotspot. For example, Jerlin Rubini and Perumal proposed to optimize the SVM algorithm by using the Drosophila algorithm and applied the optimized algorithm to the classification of chronic kidney disease, showing a good classification effect, indicating that the Drosophila algorithm has great advantages in optimizing SVM [11]. Tian and others used the fruit fly algorithm to optimize the echo state network, which greatly improved the accuracy of prediction [12]. Lu et al. applied the fruit fly algorithm and SVM to the prediction of urban gas load, which greatly improved the accuracy of short-term prediction. It can be seen from the above that the combination of Drosophila algorithm and SVM for classification or prediction has become the focus of current research [13]. The above early warning models based on deep learning realize the early warning of market risks to a certain extent. However, the prediction accuracy needs to be improved. To solve this problem, this paper applies the SVM model with excellent predictive performance and constructs a market risk early warning model by optimizing its parameters and unbalanced samples.

2. Basic Methods

2.1. SVM Model

SVM is a generalized linear classifier, which is proposed based on statistical learning theory and the principle of minimizing structural risk. Its basic principle is to construct an optimal hyperplane to maximize the distance between samples of two different categories, which is shown in Figure 1 [14]. Here, circles and squares represent two different types, respectively, and the optimal hyperplane is to maximize the range between the two dotted lines.

Suppose, dataset , is the category number. When y = 1, it means that x belongs to the first category. Also, when y = −1, it means that x belongs to the second category. Its linear discriminant function is usually expressed as [15]where is the inertia weight and b is a constant.

The classification gap is . When is minimum, the classification spacing is maximum. The form of standard SVM is [16]where c is the penalty function and is the slack variable.

SVM is used to perform nonlinear transformation for undivided linear sample data, namely, . Thus, the data sample space can be mapped in high-dimensional space. When solving, it should meet the requirement of [17]where is the Lagrange multiplier and . Since the above equation is constrained by the inequality, there is a unique optimal solution corresponding to the Lagrange multiplier . The optimal classification discriminant function is [18]where is the b value obtained from formula (2) and is the kernel function. Take the radial basis function (RBF) as an example, which can be expressed as [19]where is the RBF kernel parameter. According to formulas (4) and (5), the optimal classification discriminant function is

As can be seen from the above analysis, the classification effect of the SVM model mainly depends on two aspects: one is whether the number of classification samples is balanced, and the other is whether the kernel parameters and penalty factors of the model are optimal, while the standard SVM model does not consider the situation [20]. Therefore, in order to improve the classification effect of the SVM model, this paper improves the model from the above two aspects.

2.2. SVM Model Improvements
2.2.1. Unbalanced Sample Processing

For the unbalanced classification samples, the Synthetic Minority Oversampling Technique (SMOTE) is used to deal with them from the data level [21]:(1)Determine a few sample categories X, calculate the Euclidean distance d between samples in X, and select K samples with the smallest distance d.(2)Sample X with the multiplier of N = minority sample/majority sample, and select Xi(i = 1, 2, …, N) from K samples.(3)According to formula (7), Xi and X are synthesized into a new sample:(4)Combine and X as a new training set to learn on the SVM model.

2.2.2. Optimization of Model Parameters

To optimize kernel parameters and penalty factors of the SVM model, this paper adopts fruit fly optimization algorithm (FOA) with high searching accuracy to process. Figure 2 shows simulated behavior of fruit fly foraging process [22]. The basic operation is as follows.(1)Initialize model maximum iteration, population size, fruit fly population location range (LR), and other parameters. In 2D coordinates (X, Y), the initial position of each individual fruit fly is(2)Assign flight direction and distance to all fruit flies and olfactory search is utilized to update [23]where FR represents the single flight range of fruit fly.(3)According to formula (12), the distance between the individual position of fruit fly and the origin is calculated [24].(4)Smelli and Si are calculated according to fitness:where fitness is the discriminant function, smelli is the flavor concentration value, and Si is the judgment value of smelli.(5)Update bestSmell and bestIndex:(6)Use visual search to make other fruit flies fly to the best position [25]:(7)Repeat steps (2)–(6) until the algorithm iterates to the set number.

FOA guides the search by the current optimal solution and makes the result close to the optimal solution, so as to realize the parameter optimization.

3. Market Credit Risk Early Warning Model Based on Improved SVM

Smin and Smaj are used to represent the samples of market credit risk and noncredit risk, and S is the set of all samples. Based on the above improvements, the construction process of the market credit risk early warning model is designed as follows:(1)Calculate k nearest neighbor points of each sample point (xsmin,ysmin) in Smin, randomly select a neighbor point |Smaj-Smin|/2 to subtract (xsmin,ysmin) and multiply it by the random number in the interval [0, 1], and then add (xsmin,ysmin) to obtain a new credit risk sample xnew, and thus there is(2)Repeat the above steps until the number of xnew reaches |Smaj-Smin|/2.(3)Initialize relevant parameters of SVM and FOA. In this paper, referring to reference [26], the maximum iteration number of FOA is set to 100, and the population size is set to 20.(4)Use FOA to optimize the parameters of the SVM model, and the judgment value of flavor concentration is calculated according to and .(5)Continue to iterate until the optimal bestsmell is less than the set value, and then the value is the optimal parameter.(6)Plug optimal parameters and xnew to construct the improved SVM model and perform prediction.

The above process is illustrated in Figure 3.

4. Simulation Experiment

4.1. Experimental Environment Construction

This experiment is run on 64-bit Windows 7 professional edition system. The CPU is Intel(R)Xeon(R) e5-2620v3 2.40ghz, and the GPU is Tesla K80. In addition, the memory is 16G. The model was built with MATLAB2018a.

4.2. Data Sources and Preprocessing
4.2.1. Data Sources

The financial data of 260 listed manufacturing enterprises in Shenzhen and Shanghai from 2018 to 2020 are selected as the experiment data. Through references [21, 27, 28], there are a total of 20 financial indicators selected as credit risk warning indicators of listed companies, including 6 first-level indicators such as enterprise operation capacity, growth capacity, profitability, and so on, and 20 second-level indicators such as total asset turnover rate, net asset growth rate, return on net asset, and so on. The indicators are listed in detail in Table 1.

4.2.2. Data Preprocessing

(1) Descriptive Statistics. Since there are significant differences between the mean of the above indicator variables and standard deviation and maximum and minimum values, descriptive statistics of indicator variables are carried out, and the results are shown in Table 2.

(2) Normalization. Considering the dimensional level of index variables, z-score is adopted for normalization, which is shown in the following formula:where and represent the sample mean and standard deviation corresponding to index j, respectively.

After normalization, descriptive statistics of each indicator variable are shown in Table 3.

(3) Significance Testing. For the selection of indicators that can distinguish credit risks and nonuse risks of listed companies, this paper adopts independent sample T-test to test them, and the results are shown in Table 4. The value of 7 indicators, such as net asset-liability ratio and operating profit growth rate, is more than 10%, which indicates that it is unable to distinguish the credit risk and noncredit risk, so it is deleted in this paper.

4.3. Evaluation Indicators

Set average accuracy (G), F value, and AUC are used to evaluate the prediction performance of model. Confusion matrix is used to represent the dichotomous dataset of credit risk, which is shown in Table 5.

The model sensitivity (SE), specificity (SP), and precision (P) can be calculated as follows:

Through the above three indicators, the following can be calculated:

The larger the selected index value is, the better the model performance is.

4.4. Experimental Results
4.4.1. Model Verification

The samples are divided into training sets and testing sets according to different proportions, and experiments are carried out under different kernel functions and different optimal parameter values. The results are shown in Table 6. Under the division condition of different sample proportion, models corresponding to different kernel functions and optimal parameter values perform well in G value, F value, and AUC value, and the differences are small, which indicates that the proposed model has good prediction performance and strong generalization ability [29].

In order to more intuitively reflect the prediction performance of different kernel functions and optimal parameter values under different training sample proportions, the prediction results in the above table are plotted in Figures 46. Figure 4 shows that under different ratios of training set and testing set, the fluctuation range of G values of different kernel functions is small. Compared with the sigmoid and polynomial models, linear and RBF models have higher G values, which indicates that linear and RBF models have slightly better performance. On the other hand, Figure 5 shows that under different proportions of training sets and testing sets, F values of different kernels fluctuate greatly, but the overall F value is high. There is no trend indicating that the model F value of certain kernels type has the highest value, and thus the proposed model has good generalization ability. As can be seen from Figure 6, the AUC value of the proposed model fluctuates greatly, but the proposed model also achieves good results on this index. To sum up, the model proposed in this paper has good generalization ability and good prediction performance.

Considering that evaluating the model performance only through evaluation indexes lacks a certain scientific character, paired sample T test is adopted to test the prediction performance of different kernel function models, and the results are shown in Table 7. On the G value, the statistics of RBF and polynomial model, linear, and sigmoid model are all less than 10%, which rejects the null hypothesis, indicating that the performance of RBF and polynomial model is significantly different from that of linear and sigmoid model. On the F value, the statistics of all kernel function models are less than 10%, and all accept the null hypothesis, indicating that the performance of different kernel function models is less different. On the AUC value, the statistics of all models are less than 10%, which rejects the null hypothesis, indicating that the performance of linear and polynomial, sigmoid and polynomial, and RBF models is significantly different.

In conclusion, the change of kernel function has little influence on the prediction performance of the proposed model, which means that the prediction performance of the proposed model is relatively stable.

4.4.2. Model Comparison

To further verify the validity of the proposed model, the prediction performance of the proposed model is compared with that of other models. The results are shown in Table 8. Compared with FOA-SMOTE-BP and FOA-SMOTE-Logit, the proposed model has the best performance on the indicators of G value, F value, and AUC value, indicating that the model proposed in this paper has the best prediction performance.

In order to observe the prediction performance of different models intuitively, the results of the above table are drawn as shown in Figure 7. It can be seen from the figure that the G value, F value, and AUC value curves of the proposed model are significantly higher than those of the comparison model, indicating that the proposed model has better prediction performance.

5. Conclusion

In summary, the proposed market risk early warning method based on deep learning takes SVM as basic model and uses SMOTE to deal with unbalanced sample. In addition, minority samples are subjected to oversampling, and FOA is utilized to tune model parameters. Thus, the classification effect of the model is improved. The empirical results show that the proposed model has excellent stability and generalization ability and can accurately predict market credit risk. Compared with FOA-SMOTE-BP and FOA-SMOTE-Logit models, the proposed model performs better on G value, F value, and AUC value indicators and has better prediction performance, which provides a reference for market credit risk prediction research. The contribution of this study is to use a new improved SVM to predict the market risk. It provides a new reference for information management and prevention of the market. However, as the limitation of conditions, there are still some deficiencies to be improved. In the selection of market credit risk indicators, they are selected just through references, without considering the actual situation of China’s manufacturing industry, which may affect the final market risk prediction results. To avoid the influence of index selection on prediction accuracy, the next research will try to independently determine the relevant index variables affecting market credit risk.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.