Abstract

In order to provide timely and effective information and decision support for financial market entities, combined with random subspace and weight fused Lasso, this paper constructs a financial risk prediction model based on the improved random subspace method. Firstly, the basic principles of random subspace and SVM algorithm are introduced. Then, WFL and Al methods are introduced to improve random subspace, so as to reduce the dimension of multisource heterogeneous data and realize the adaptive fusion of features. Then, a financial risk prediction model based on weighted fusion adaptive random subspace is constructed, in which SVM is used as the basic classifier and the output strategy of result integration is introduced. Finally, based on the data of some listed companies, the improved random subspace method is compared with other methods. The results show that the improved random subspace method has a higher prediction value, which indicates that the method is reasonable and effective in financial risk prediction. In the improved random subspace method, combined feature F1 + F2 + F3 is better than other methods in T − 3, T − 4, and T − 5, and the prediction value is more than 95%, which fully demonstrates the rationality of the improved random subspace method in financial risk prediction. The area under the ROC curve (AUC) predicted by weight fused adaptive integration-based random subspace (FAIB_RS) method is about 95% in T − 3, 93% in T − 4, and 95.5% in T − 5, which is obviously higher than that of the other eight methods.

1. Introduction

With the rapid development of the financial industry and the continuous popularization of the internet financial model, the financial market is facing more severe financial risks. In order to improve the ability of financial market subjects to obtain financial risk early warning information, domestic experts and scholars have done a lot of research on financial risk and put forward a variety of financial risk prediction methods. For example, literature [1] used RBF network model to establish Jiangsu Province’s financial risk early warning model and predicted the regional total risk in 2019 with the sample set in 2018, so as to obtain the prevention and treatment suggestions of Jiangsu Province’s financial risk. Literature [2] constructed the regional financial risk index; evaluated and predicted the financial risk of 31 provinces, cities, and autonomous regions (excluding Hong Kong, Macao, and Taiwan) in China; and concluded that the overall financial risk of China and the 6 regions showed a synchronous trend of change and got the enlightenment of formulating preventive measures. According to the current situation of financial development, literature [3] proposed a multifactor international supply chain financial risk prediction model including external environmental risk factors, obtained the structural characteristics and formation mechanism of international supply chain financial risk system, and put forward countermeasures and suggestions. Most of the above studies are based on a single data source, and the prediction accuracy is low; the effect is not good. Therefore, combined with the financial risk prediction problem, this paper constructs a new weighted fusion adaptive stochastic subspace financial risk prediction method and verifies the rationality and effectiveness of the method. Literature [4] shows that the data of compliant financial activities are of high quality and quantity in all aspects. On the contrary, the data of informal financial activities are of low quality. Therefore, the machine learning method based on single-source data can only detect the risks of formal financial activities, while the effect of using multisource data to detect financial risks is not good. Then, TSAIB_RS method is proposed to integrate various data adaptively. Literature [5] makes quarterly observations from 225,813 company samples. After investigation, it is found that financial system risks play a great role in predicting the failure of an enterprise when its internal financial sector fluctuates greatly; its scale is small; and its debts are large. Although the integration of classifiers in reference [6] has been applied in the financial industry, there are also some wrong factors that hinder the prediction performance, such as irrelevant features, inclined categories, and so on. In the event of such an error, the cost of the wrong classification is far greater than the cost associated with the non-default or non-insolvency (negative) category. In the future, we need to deeply study the potential relationship between classifier ensemble and positive sample type. In today’s “big data” era [7], big data has been gradually integrated with finance and become the core of the financial industry. How to make good use of big data to effectively predict and prevent is of great significance to the financial industry, which is also the essence of financial management, that is, risk management and control. Combined with the advantages of big data prevention and control and prediction, this paper summarizes feasible and effective financial risk management and control countermeasures.

The reasoning method based on belief rule in reference [8] is widely used in risk assessment of research and development (R&D) projects. Because there are many risk factors in the performance evaluation of R&D projects, the BRB method will produce a rule base. Therefore, the stochastic subspace BRB model has been experimented and applied as a new RS-BRB model. It constructs several subspaces from sampling; then develops BRB subsystems according to the subspaces, thus obtaining results; and finally carries out combination mode according to different subsystem results. The traditional forecasting model can no longer meet the current forecasting needs [9], and then researchers put forward a method called RS-multiple boosting to improve the accuracy of forecasting credit risk. This method is a combination of two classical integrated ML methods: random subspace (RS) and multiple boosting. There are many methods for risk assessment and prediction in the financial market, but most of the above studies are based on a single data source, with low prediction accuracy and poor results. The random subspace method can be used for portfolio selection in different data sets, which shows that it is essentially superior to the traditional bagging-based resampling portfolio. In order to solve the problem of single information source, less data application, low prediction accuracy, and poor prediction accuracy of financial risk prediction at present. In this study, a financial risk prediction method based on the adaptive fusion of multisource heterogeneous data is constructed for the financial risk of listed companies and the default risk of individual borrowers, and its effectiveness is verified by using various types of real data sets collected from online platforms. It has a good effect on the prediction effect and accuracy and can solve the problems existing in the financial system. In view of the current financial data with heterogeneous, redundant, and other issues, the current prediction model in the multisource heterogeneous data prediction is not very satisfactory. This paper proposes a financial risk prediction method based on the adaptive fusion of multisource heterogeneous data, which can effectively predict the multisource data of financial companies and improve the prediction accuracy. In this paper, the problems of financial risk prediction are introduced in detail; then, a financial risk prediction method WFAIB_RS based on the weighted fusion of adaptive random subspace is constructed; finally, through the experimental comparison of the data sets of listed companies, WFAIB_RS has a better prediction effect.

2. Introduction to Basic Methods

2.1. Random Subspace Brief Introduction of Algorithm

Random subspace [10, 11] (RS) is a kind of ensemble learning. Random subspace trains each classifier by using random partial features instead of all features to reduce the correlation between each classifier. Therefore, this method is very suitable for learning tasks with high feature dimensions. The RS steps are as follows:Step1: according to the feature dimension of data samples, data samples are randomly selected to form data subsets of similar sizes. The subspace ratio parameter r is used to adjust the size of the data subset.Step2: the sampled data subsets are input into the base classifier and trained.Step3: finally, according to the training results, the sample results are fused.

2.2. SVM Classification Principle

Using SVM [12] as the base classifier can be solved according to the following objective function:where C is the normal vector, C is the penalty coefficient, B is the displacement term, and C is the non-negative relaxation factor.

The above problem is transformed into a dual problem:

In the calculation process of dual problem, through the calculation of , which is mainly the inner product operation after the sum of pairs. Then the inner product operation is input into the kernel function. The expression of the kernel function is as follows:

The final decision function is obtained:where and b are constant real numbers; ai>0. The kernel function k(i, j) using radial basis function kernel function is defined as follows:

2.3. Random Subspace Algorithm Improvement

In order to ensure the accuracy of financial risk prediction and the problem of multisource heterogeneous financial data, the random subspace algorithm needs to be improved first. Therefore, there is a need for multisource heterogeneous data fusion in financial risk prediction data. Based on the Lasso model, the weighted fusion adaptive stochastic subspace model (WFAIB_RS) is introduced for the adaptive fusion of multisource heterogeneous data features. The Lasso model takes the form of

Here, is the penalty term added based on the Lasso model, is the correlation coefficient between any two features and , and represents the regular penalty parameter, which mainly adjusts the penalty intensity for feature correlation.

The WFL model can effectively solve the multicollinearity problem between features and improve the stability of the model. After the Lasso model is introduced into WFL, the model form is as follows:

Here, represents the adaptive weight, which can be added to the WFAL model to obtain a more accurate feature subset; denotes the smoothing item. with the increasing of parameters, more feature weights are 0. As the regular penalty parameter becomes larger, more features will be regarded as related features.

The adaptive feature weights based on weighted fusion are obtained by WFAL model estimation, and then the features are sampled according to the weights, and the data subsets and that can be used for base classifier training are obtained. The sampling process is mainly adjusted by subspace ratio parameters. The larger the , the higher the feature dimension of the obtained sample subset.

3. Construction of Financial Risk Prediction Model

3.1. Financial Risk Forecasting Process

Set and test set . Firstly, Lasso estimation is carried out on the data to obtain the adaptive weight, specifically as follows:

Secondly, the adaptive weight is used to estimate the data by WFAL to obtain the feature weight vector .

Thirdly, the feature weight is taken as the sampling probability, and the sampling of data subset is carried out under the adjustment of subspace ratio r.

Fourthly, the base classifier is trained according to the sampled data subset. Fifthly, according to the rules of evidential reasoning, the prediction results of the base classifier are synthesized to get the final prediction results.

3.2. Base Classifier

Support vector machine (SVM) is selected as the base classifier [1315]. This method is a typical classification algorithm, and its characteristic is that it can solve high-dimensional and non-linear classification problems by using the principle of structural risk minimization and can efficiently classify data samples with few samples and high feature dimensions [16, 17]. Therefore, SVM is chosen as the base classifier of the financial risk prediction model.

3.3. Integration Strategy

In the result fusion, evidence reasoning is used as a new ensemble strategy to synthesize the results produced by different base classifiers. Evidential reasoning is to treat the classification results of different base classifiers as evidence and the accuracy as evidence reliability and initial weight when fusing the results of base classifiers and finally fuse the results by continuous optimization [18, 19].

Firstly, all the classification results are regarded as a set of mutually exclusive identification frames, which can constitute a complete set, and are designated as . The results produced by the base classifier s can then be converted into the following evidence:where represents the evidence of transformation from the classification results obtained from the s-th base classifier and denotes the probability that the classification result goes to .

In the process of evidential reasoning, weight W and reliability R are often used to define the reliability distribution function, which is mainly to ensure that the result information of the base classifier does not conflict. The expression of the reliability distribution function is as follows:

Here, represents the holding degree of evidence considering reliability and weight to , which is defined as follows:where is the normalization factor.

Secondly, by fusing S pieces of evidence provided by different base classifiers, the reliability function of S pieces of evidence jointly supporting is obtained, and the expression of reliability function is as follows:

After evidential reasoning, the fusion result of the base classifier is the category corresponding to the maximum value of the final classification result of model.

Finally, this paper takes different classification results as evidence, takes accuracy as the reliability and initial weight of evidence, and then obtains the optimization weight of the base classifier through repeated optimization.

The training model is as follows:where m represents the amount of data in the training set; represents the distribution of true classification results, and represents the probability distribution of base classifier results on different categories after synthesis.

4. Experimental Verification

In this paper, the improved random subspace method is used to predict the financial risk of listed companies. The experimental flow chart is shown in Figure 1.

4.1. Experimental Data Set and Model Evaluation Indicators

In this study, 1,726 listed companies in China were used as experimental samples, and 1,597 normal samples and 129 risk samples were obtained according to ST markers. The time panel for collecting experimental data of risk samples is divided into 3 years ahead of schedule, 4 years ahead of schedule, and 5 years ahead of schedule, which are expressed by T − 3, T − 4, and T − 5, and the time span is 5 years from 2016 to 2020. The specific distribution is 14 in 2016, 27 in 2017, 22 in 2018, and 32 in 2019.

The experimental data set consists of 39 financial features, 12 emotional features, and qualitative text features, which are represented by F1, F2, and F3, respectively.

In the experimental process of financial risk prediction of listed companies, we mainly adopt four evaluation criteria: average accuracy (AA), type I error, type II error, and AUC [20, 21] (the area under the ROC curve). The main classification results are true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The calculation formula of specific indicators is as follows.

AUC is the area below the ROC curve, which is usually between 0 and 1. ROC is a curve in two-dimensional coordinates, and its horizontal axis is false positive case rate, and its vertical axis is true case rate.

4.2. Comparison Methods

In this paper, the base classifier, three ensemble learning methods, and unbalanced classification methods are used as comparison methods. Therefore, the comparison methods in this paper include SVM, bagging, RS, and the improved random subspace method.

In this experiment, the stability of the experimental results is verified by the cross method of tenfolds and ten times.

The specific steps are as follows.(1)Dividing the data set into 10 data with the same size and scale on average; 1 of them is used as the test set, and the other 9 are used as the training set. In the process of training, each fold should be repeatedly trained 10 times. The average value of 100 experimental results obtained after training is calculated, and the final result is obtained.

During the experiment, the ratio of random subspace increased from 0.1 to 0.9 according to the increase of 0.1. The setting ratios of misdivision cost for positive and negative cases are 1,726/129 and 1,726/1,597, respectively. The regularization parameters are 0.001, 0.01, 0.1, 1 and 10, respectively. The regularization parameter is optimized by cross-validation.

4.3. Experimental Results

It works with different characteristics, different methods, and different time panels, and the experimental results are shown in Table 1. The bold data in Table 1 is the highest value of this feature.

It can be seen from the bold data in Table 1 that the improved random subspace method has achieved excellent results compared with other methods in this experiment. In the AA index, the improved random subspace method achieves good results on higher dimensional feature sets F3, F1 + F3, F2 + F3, and F1 + F2 + F3. In T − 3, the values of the improved random subspace method are 93.61%, 94.78%, and 96.39%. In T − 4, the values of this method are 94.53%, 95.59%, and 96.77%. In T − 5, the values of this method are 95.79%, 96.17%, and 96.67%. The comparison analysis shows that the improved random subspace method has achieved the highest results in financial risk prediction among all methods, especially in the feature set of F1 + F2 + F3 under the time panel T − 5, which has achieved an average rate of 97.67%.

In the AUC index, the improved random subspace method also achieves good results, especially in the feature set F1 + F2 + F3; for example, the value of T − 3 is 95.24%. The value in T − 4 was 94.3%. The value of T − 5 was 95.91%, which was higher than that of other eight methods. Therefore, the improved random subspace method is very suitable for financial risk prediction.

From the bold data in Table 2, it can be seen that the improved random subspace method can balance the two types of error rates of type I error and type II error, which shows that this method can effectively deal with both high-dimensional problems and unbalanced problems in financial risk prediction. Among them, the error rate of type II error is higher than that of type I error as a whole. The main reason is that the data distribution is uneven, and there are few samples in a few classes, which leads to insufficient training. Therefore, the data of a few samples are easy to be misclassified in prediction.

4.4. Analysis of Experimental Results
4.4.1. Analysis of Prediction Results of Different Feature Sets

According to the above experimental results, this paper will compare and analyze the prediction results of different features and different time panels. The comparison results of features under T − 3, T − 4, and T − 5 time panels are shown in Figure 2.

It can be seen from the figure that in a single feature, the performance of F1 is relatively stable, the prediction performance of F2 is relatively weak, and the prediction effect of F3 is obviously better than that of F2. This result shows that financial features play the strongest role in the financial risk prediction of listed companies and are in a leading position. Compared with the single features F1, F2, and F3 and the combination features F1 + F2, F1 + F3, F2 + F3, and F1 + F2 + F3, the combination feature F1 + F2 + F3 is better than the single feature. In the improved random subspace method, combined feature F1 + F2 + F3 is better than other methods in T − 3, T – 4, and T − 5, and the prediction value is more than 95%, which fully demonstrates the rationality of the improved random subspace method in financial risk prediction.

4.4.2. Analysis of Prediction Results by Different Methods

According to the prediction results of AUC by different methods, the effectiveness of the improved random subspace method in financial risk prediction is shown in Figure 3 for the prediction results of AUC of T − 3, T – 4, and T − 5.

As can be seen from Figure 3, among the combined features F1 + F2 + F3, the AUC predicted by the improved random subspace method is the highest, and the AUC predicted by FAIB_RS method is about 95% in T − 3, 93% in T – 4, and 95.5% in T − 5, which is obviously higher than that of the other eight methods. The possible reason why these methods improve the sum of F2 is that compared with financial features and text features, F2 has smaller feature dimensions and contains less prediction information, while SVM and bagging methods can effectively supplement samples, thus playing a good role in strengthening F2. Therefore, it shows that the improved random subspace method is effective in financial risk prediction and can predict financial risk more accurately.

5. Conclusion

In this paper, a new financial risk prediction method, the improved stochastic subspace method, is constructed. Firstly, multisource heterogeneous features are extracted based on multisource data sets, including quantitative financial features and emotional features and text features based on qualitative text information. Secondly, the advantages of the stochastic subspace method are fully absorbed in the model construction, and it is taken as the model foundation. At the same time, a new adaptive fusion method considering the relationship between features is obtained by introducing the regularized sparse model to integrate adaptive and weighted feature fusion strategies into the stochastic subspace method.

In order to verify the effectiveness of the proposed method, an experiment is carried out on the real data set of Chinese listed companies, and different characteristics and different methods are compared in the experiment. Finally, through the analysis of the experimental results, the effectiveness and stability of the proposed method in the financial risk prediction of listed companies are fully demonstrated. The future work will focus on the reanalysis of the prediction effect of data sets under different models, such as the discussion of prediction accuracy and time.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding this work.