Abstract

In recent years, supply chain finance (SCF) is exploited to solve the financing difficulties of small- and medium-sized enterprises (SMEs). SME credit risk assessment is a critical part in the SCF system. The diffusion of SME credit risk may cause serious consequences, leading the whole supply chain finance system unstable and insecure. Compared with traditional credit risk assessment models, the supply chain relationship, credit condition of SME, and core enterprises should all be considered to rate SME credit risk in SCF. Traditional methods mix all indicators from different index systems. They cannot give a quantitative result on how these index systems work. Furthermore, traditional credit risk assessment models are heavily dependent on the number of annotated SME data. However, it is implausible to accumulate enough credit risky SMEs in advance. In this paper, we propose an adaptive heterogenous multiview graph learning method to tackle the small sample size problem for SMEs’ credit risk forecasting. Three graphs are constructed by using indicators from supply chain operation, SME financial indicator, and nonfinancial indicator individually. All the graphs are integrated in an adaptive manner, providing a quantitative explanation on how the three parts cooperate. The experimental analysis shows that the proposed method has good performance for determining whether SME is risky or nonrisky in SCF. From the perspective of SCF, SME financing ability is still the main factor to determine the credit risk of SME.

1. Introduction

As a result of the COVID-19 pandemic, global industrial chain and supply chain have suffered a severe setback. Industrial chains across the world are experiencing an unprecedented crisis, which has rarely been encountered nearly the century. The accompanying adjustment of supply chain and industrial chain has a great impact on small- and medium-sized enterprises (SMEs). The survival predicament faced by SMEs also arises in China. According to the National Bureau of Statistics, the SMEs in China played an essential role in the national economy, accounting for more than 99% of the total number of enterprises, more than 80% employed people in industrial enterprises, more than 70% of the national total technological innovation results, more than 60% of the GDP, and more than 50% of the national total tax revenue. However, the financing difficulties faced by SMEs have always been the key issues for their development and survival.

Supply chain finance (SCF) is a series of financing modes designed to solve the SMEs financing problem [1], which integrates capital into supply chain management. With the core enterprise as the center and the real trade as the background, SCF transforms the uncontrollable risk of a single enterprise into controllable risk of the whole supply chain enterprise through effective control of capital flow, information flow, and logistics. It effectively builds a benign industrial ecology of banks, core enterprises, and SMEs, promoting the interactive development of capital and industry. In SCF, credit risk is regarded as one of the core issues, which must be considered seriously, especially that of SMEs in the upstream or downstream of core enterprises. Compared with banks and core enterprises, SMEs are more prone to encounter credit risk. In the SCF system, once the credit risk occurs, the credit status of enterprises in the chain will be magnified and even spread to the whole supply chain due to the connectivity of the supply chain [24]. However, credit risk is inevitable. Therefore, it is necessary to establish a credit risk assessment model for SMEs to effectively control risks, forming a stable SCF system, both in the construction and operation process.

The SMEs’ credit risk assessment system under the SCF environment has always been the focus of the academic world and the financial world [5]. The deep integration between core enterprises and SMEs in SCF can reflect the future capital capacity and cash flow of SMEs. This characteristic provides a solid data base for credit risk prediction in SCF. The academic world constructs the credit risk assessment index system from different perspectives. Constructing the credit risk assessment system based on different factors is one of the branches. Lekkakos et al. considered core enterprise qualification as one of the main factors for SME credit evaluation [6]. Wuttke found that the buyer has an impact on supply chain finance [7]. These literatures investigated the correlation between SCF and SME credit risk theoretically. However, some indicators are neither difficult to obtain in practice nor to analyze quantitatively [8].

Another branch established the credit risk assessment system for SMEs in SCF from different perspectives. Altman and Sabato [5] designed an SME credit risk evaluation system, including five financial indicators such as liquidity, earnings, leverage, coverage ratio, and business activity ratio. However, only financial indicators are selected as the main features for SME credit risk assessment and other nonfinancial indicators are neglected. Yurdakul [9] combined both finance and nonfinance indicators to evaluate the SME credit risk in Turkey, including the revenue generation capacity, cost control, operational efficiency and profitability, short-term liquidity, capital structure, and other solvency indicators. In order to consider the factors synthetically on SME credit risk assessment in SCF, many other indicators were also involved in the system. From the perspective of supply chain, Rostamzadeh et al. [10] proposed to take operational and major policy risks into considerations. Mou et al. [11] constructed the evaluation system from four aspects: industrial status, operation status, asset status, and credit history. Among these systems, there may be overlapping or contradictory phenomena among the credit risk assessment indicators selected from different views. It should be emphasized that this issue is difficult to solve for most of the conventional prediction model. Furthermore, most of existing credit risk assessment systems focus on selecting single discriminative indicator (e.g., three-class indicator), without considering the overall discrimination of the whole perspectives (e.g., one or two-class indicator). They cannot give quantitative results about the relationships of the different views.

Recently, machine learning approaches have been widely applied as a substitute to traditional qualitative and statistical analysis methods on credit risk assessment field [12, 13]. Logistic regression (LR) is the basic method to assess credit risk for its simplicity and usability [14]. However, it is insufficient to describe the complex situation of credit risk. The prediction accuracy needs further enhancement. Support vector machine [15] (SVM) is another approach to tackle credit risk problem. It has been proved that the SVM-based credit risk assessment model is effective and advantageous than the LR model based on principal component analysis (PCA) [16]. Danenas and Garsva identified risky enterprises in imbalanced datasets based on linear SVM and particle swarm optimization [17]. The authors of [18] applied a nonlinear SVM classifier with genetic optimization for credit risk assessment. The artificial neural network (ANN) has also attracted a wide range of attention in credit risk forecasting. Adnan made a comprehensive investigation of different supervised neural models and learning schemes for credit risk evaluation [19]. The deep belief network (DBN), as a special paradigm of ANN, has been found to yield the best performance on the CDS dataset [20]. In addition, tree-structured [21] and graph-structured [22] methods have also been shown to be effective on credit risk assessment.

Various approaches have shown their advantageous on credit risk evaluation. However, to assess the credit risks of the SMEs in SCF environment, we need to construct the model from a perspective of the supply chain, rather than only assessing repayment ability [23]. The funds and credit condition of SME, core enterprises, and supply chain relationship are to be involved in the machine learning algorithms. These information from different views need to get organized properly. Most of approaches concatenate the indicators together to evaluate SME credit risk in supply chain finance. By concatenating the indicators, Zhang et al. [23] and Xu and He [24] applied SVM and restricted Boltzmann machine (RBM), respectively, for assessment. From the perspective of feature fusion, these early fusion approaches treat all features in the same way, which weakens the internal connections among different views. Other literatures ensemble multiple machine learning models to make a comprehensive decision. Zhang et al. [25] considered the leading enterprise’s credit status and the relationships developed in the supply chain by assembling SVM and BP. Zhu et al. [26] proposed a two-stage hybrid model by integrating the results of LR and ANN to boost the accuracy of the single model. Nevertheless, all methods mentioned above are heavily dependent on the annotated SME data tightly. Once the number of annotated samples decreases, the algorithm performance will decline rapidly. However, it seems implausible to accumulate enough credit risky SMEs in reality. Compared with the method above, graph-based learning can work with small number of annotated data by modeling the relationships of samples, which has been widely used in many other fields [2729]. It is difficult to construct a graph to cover local manifolds of different view features. Therefore, how to integrate multiview heterogenous features into graph learning is worth pondering.

In this paper, we propose a heterogenous multiview graph learning method for SMEs’ credit risk forecasting. The credit risky or nonrisky SME is recognized by diffusion process on the fused multiview manifolds. Considering different views having different impacts on credit risk recognition, we learn the view weights and credit risk scores simultaneously. The main contributions of our work are summarized as follows:(1)Tackling the problem of lacking enough annotated data, we propose a multiview graph-based semisupervised learning approach by modeling the local manifold between samples for SME credit risk assessment in SCF.(2)The proposed multiview graph-based learning approach integrates multiview features in a comprehensive way. The SME credit risk scores are achieved by learning the complementariness of different views automatically, which improves interpretability.(3)Comprehensive experiments are conducted to empirically analyze the proposed SME credit risk in the SCF method. The experimental results on the collected dataset demonstrate the effectiveness of the proposed method.

The remainder of this paper is organized as follows. In Section 2 and Section 3, we discuss the single graph learning and multiview graph learning. Section 4 describes the data and variables. Section 5 demonstrates the experimental results. The last section is the conclusion.

2. Single Graph-Based Learning for Classification

Graph-based learning [30, 31] has attracted great interests in classification tasks due to its effectiveness and flexibility to various areas. A graph describes the pairwise relationships based on the given data, where vertices are labeled and unlabeled samples and edges indicate the relationships of vertices. Generally, single graph-based learning can be formulated as follows.

Assume that there are data points in the dataset. Each sample can be represented by a -dimensional feature vector. Without loss of generality, we assume the first samples are labeled from dataset by experts in advance, where is the class label. Our goal is to assign an optimal label for each of the rest unlabeled data points through graph learning.

A graph is defined as . is the set of vertices, where each sample is a vertex in , including both labeled and unlabeled samples. is the set of edges, where links the adjacent vertices and . is termed as affinity matrix, where is the weight of , indicating the strength of vertices and . The value of can be either discrete (e.g., ) or continuous (e.g., 0.54).

Generally, graph-based classification task is formulated in a regularization framework:where is the to-be-learned relevance score vector defined on domain , is an empirical loss function, is a regularizer term on , and is a nonnegative parameter.

implements the smoothness assumption on the graph by considering data points on the same manifold are likely to share the same label.

The regularizer on the graph is defined as follows:where is a diagonal matrix given by , that is, is the sum of the -th row of .

Let , and the normalized graph Laplacian is denoted as follows:here is a unit diagonal matrix. Finally, the regularizer can be rewritten as follows:

The empirical loss function conducts the consistency of labeled vertices by forcing the assigned labels close to the initial labels.where is the initial label vector. Three possible values can be assigned to : if vertex is the positive sample, if it is the negative sample, and if it is unlabeled. The closed-form solution for the minimization is found to be

After obtaining , data point can be classified according to its sign, i.e., positive if and negative otherwise. In addition, the relative value of data points in can also be ranked according to the learned .

3. Multiview Graph-Based Learning for Classification

Single graph learning has improved to be an effective way for classification task, especially when labeled samples are limited. However, to make a comprehensive evaluation on SME credit risk, many different aspect information should be taken into consideration, such as the funds and credit condition of SME, core enterprises, and supply chain operation. These complementary knowledges contained in multiple views to comprehensively represent the credit status of SME. In this situation, more than one view features can be used to measure the affinity between vertices.

Suppose we have sets of features. Accordingly, graphs can be constructed, denoted as . The regularization framework in equation (2) can be extended to handle multiview features by combining each graph with a set of weighting coefficient. Then, the framework is written as follows:

The coefficient is correlative with the importance of the -th view, constrained by and . and are the affinity matrix and the diagonal matrix for graph , respectively. The value of can be set as prior knowledge in advance by domain experts according to the importance of different views. Then, the solution of equation (1) can be derived as follows:where is the normalized graph Laplacian of . Equation (8) amounts to learning on a fused graph, where several normalized graph Laplacians are combined through the coefficient . However, there has a lot of confusion about which view is the more important one for classification task. Subsequently, in most cases, it is not an easy way to set value for in advance. It is desired to incorporate the influence of into the above learning framework. To avoid trivial solutions, a relaxation on weight coefficient by changing to is performed on equation (7).where is a hyperparameter larger than and is used to avoid only the smoothest graph play the key role (), while graphs from the other views are invalid (). Finally, the cost function of multiview graph-based learning is reformulated as follows:

Equation (10) holds that data are in local manifold spaces for different views, respectively. Combination of multiple features can be done through a weighted union of graphs generated by different views. Each graph resembles a weak classifier generated from a single cue, and together they form a stronger classifier.

The relevance value $f$ and weight coefficient are achieved by minimizing equation (10).

The optimization strategy is to learn and simultaneously by fixing one and optimizing another for each iteration. We first fix and optimize .

The partial derivative of the objective function with respect to is

By setting , we have

Then, we fix to optimize . The partial derivative of the objective function with respect to is

Similarly, we can obtain with Lagrange multipliers because .

4. Data and Variable

4.1. Data

To validate the effectiveness of the multiview graph-based learning method in forecasting SMEs’ credit risk in SCF, we construct a proper database of Chinese SME. SCF is a new and developing branch of financing. Because of imperfection of the theoretical study, development of SCF in China is less than satisfactory. Until recent years, SCF business is gradually on the right track, making the supporting businesses and data disclosure relatively perfect. However, it is still difficult to gather a complete dataset of SCF, especially unlisted companies. Therefore, the listed companies are selected as the main data source.

Due to the huge difference in factors and characteristics of different industries, it may lead to the decline of the prediction model by using the whole industry data. Manufacturing industry is the key field of SCF, which attracted widely attention of researchers. However, traditional manufactured products (e.g., auto industry) were so complex, which has long chains from suppliers to distributors, that thorough collection of SCF data was impossible. Compared with traditional manufacturing, the mobile manufacturing supply chain data scale is moderate (e.g., a mobile phone has more than 80 parts, while a car has tens of thousands of parts). The division of labor among upstream suppliers, midstream manufacturers, and downstream distributors in mobile phone manufacturing industry is relatively clear. Furthermore, mobile phones are time sensitive, which requires enterprises on the chain to cooperate closely. Taking all these into account, we select 104 quoted SMEs data in Wind dataset. After deleting invalid data, our final dataset consists of 924 samples over the period (2011–2019).

Some literatures [26, 32] identified credit risk by considering whether the enterprise is special treatment (ST). They held that the ST and ST companies are in financial crisis [33]. These companies may have higher credit risks than the general ones. ST company has suffered losses for two consecutive years. However, it cannot reflect the credit risk when a company suffered losses in the first year. The Z-score model [34] is adopted to define the credit risk, which is widely used in enterprise financial health measurement. Finally, the samples, whose Z-score < 1.8, are considered as risk enterprises in the experiments, while the remaining samples are nonrisk type. In total, the dataset consists of 265 risky enterprises and 659 nonrisky enterprises. In the experimental settings, $40$ samples are treated as known risky enterprises denoted by –1 and 40 are known as nonrisky enterprises denoted by +1. The rest unknown samples (denoted by 0) need to be validated by the multiview graph-based learning. The sample distribution is given in Table 1.

4.2. Independent Variables

In compliance with the widely accepted 5C principle, the credit risk assessment system is constructed by three views, such as supply chain operation, SME financial indicator, and nonfinancial indicator. To facilitate the observation, the descriptions of the independent variables for supply chain operation, SME financial indicator, and nonfinancial indicator are shown in Tables 24, respectively.

4.3. Evaluations

To evaluate the effectiveness of proposed methods, average accuracy, Type I error, and Type II error are adopted in the experiments. These evaluation criteria are defined as follows:here is the total number of samples in the testing set, and are the number of positive and negative samples, respectively , is the number of all correctly predicted samples, and and are the incorrectly predicted positive and negative samples.

Average accuracy is used to evaluate the overall performance of the model. The higher, the better. Sometimes, the imbalance between positive and negative samples may lead an inaccurate evaluation. Type I and Type II errors are also adopted as a supplement to evaluate the performance of the model. The lower, the better. Type I error describes the accuracy of positive samples. In SME credit risk assessment, Type II error with a high ratio will result in a loss of potential customers, which incorrectly classify nonrisky enterprises into risky. Type II error presents the accuracy of negative samples. Incorrectly classifying risky enterprises into nonrisky will lead a high ratio of Type II error, which may expose great risk on bank, core enterprises, and SMEs in SCF. In order to build a stable SCF system, Type II error is more vital than the other two criteria.

4.4. Experimental Settings

The experiments are performed on a computer which has Intel Xeon (R) 2.13 GHz 8 processors 8 GB RAM and the 64 bit Windows10 system. The experiments are conducted based on Python, NumPy, and Scikit-learn. The data are firstly normalized by the following:where is the normalized data, is the source data, and and are the mean value and standard deviation of . Then, principal component analysis (PCA) is performed on for dimension reduction. By preserving 90% energy, 18-dimensional features are prepared for LR, SVM, and GL. Particularly for M-GL, PCA is performed on each view separately. For GL and M-GL, a vertex is linked with its nearest 5 neighbors and the Euclidean distance is used for measurement. The parameter in equation (6) and equation (8) is set to 0.01, and in equation (8) is set to 1.9. The initial weights for the three views are set equally.

5. Results and Discussion

5.1. Experimental Settings

In this part, we compare the proposed multiview graph learning for SME credit risk assessment in SCF with several related works.(1)LR. Logistic regression [35] is performed on the processed features from all the three views.(2)SVM. Support vector machine is performed on the processed features from all the three views.(3)GL. Single graph learning is performed by using the same features as LR and SVM. Only one graph is constructed by using the processed features.(4)M-GL. The proposed multiview graph-based learning approach integrates multiview features individually, which can adaptively assign different weights for the three views. Three graphs are constructed by using the processed features from the three views correspondingly.

Figures 13 describe the obtained results for the four models when the number of negative and positive training samples is 40, respectively. Upon inspecting the results in Figures 13, we notice that the proposed M-GL performs best on all the three evaluation criteria for SME credit risk assessment in SCF. It demonstrates the superiority of the proposed M-GL. From the perspective of feature fusion, all methods listed above integrate the features from the three views. However, M-GL treats these features as heterogeneous elements, which assigns different weights of each view for credit risk assessment. The other works treat the three views as a whole. They cannot distinguish which view is more valuable for SME credit risk assessment in SCF. Though some works (e.g., LR) can obtain the most important independent variables, the models depend too much on these features, leading a high data sensibility. On the contrary, graph-based methods (GL and M-GL) exploit the smoothness between samples. The loss or inaccuracy of one feature has little influence on the assessment model. During data collection, not all indicators can be easily obtained for every SME.

The noise resistance is an important characteristic in practical financial applications. In the next subsection, we will investigate the impact of different financial views for SME credit risk assessment in SCF.

5.2. The Impact of Different Financial Views

For traditional SME credit risk assessment methods, it is difficult to make a quantitative investigation on which one is the most informative view on risk assessment. They either concentrate the features as a whole or treat each indicator individually. Few works [32] give discriminative analysis on each indicator. However, the effectiveness of single indicator cannot provide evidences to guarantee the overall effectiveness of the corresponding view. The proposed M-GL constructs three individual subgraphs by using the features from supply chain operation, SME financial indicator, and nonfinancial indicator. By means of novel graph construction and optimization method, M-GL can adaptively distinguish the importance of different views. The upside of this kind of SME credit risk assessment is suitable for building the secure and stable SCF system. For example, the SME can improve their credit artificially by tampering a certain financial data to deceive the bank or core enterprises. It is of great danger for the whole SCF system. Tampering with a small amount of data will not affect the risk assessment results obviously for the proposed M-GL, which is one of the strengths of our approach. Table 5 gives the learned weight coefficient for each view. , , and are the learned weight coefficients for supply chain operation, SME financial indicator, and nonfinancial indicator, respectively. From Table 5, we can see that SME financial indicator is the main factor for SME credit risk assessment in SCF. However, this does not mean that supply chain relationship is useless. The view of supply chain relationship is also an effective supplement, which is also informative to assess the SME credit risk assessment in SCF. In contrast, nonfinancial indicator has the lowest value among the three views. Partly because we investigate the mobile manufacturing supply chain data, the difference of influencing factors between different industries have been weakened during data collection phase.

5.3. The Impact on the Number of Labeled Samples

The proposed M-GL is a graph-based transductive classification. An advantage of transductive methods is that it may be able to make better predictions with fewer labeled samples. In this part, we investigate the performance for different amounts of initial labeled samples. Figure 4 reports the average accuracy, and Figures 5 and 6 report Type I and Type II errors of SVM and M-GL with varying amounts of positive and negative samples. M-GL can be regarded as label diffusion processes through three subgraphs. A good graph method needs to propagate labels to unlabeled samples. It is observed that M-GL outperforms SVM throughout the experiments. Especially, when the number of labeled samples is 10, the advantage of M-GL is more apparent than SVM on the three evaluation criteria. In the practical financial forecasting, labeling samples are always inconvenient, especially the risky SME. It has a great influence on traditional methods when the labeled samples are insufficient. Nevertheless, we can obtain lots of unlabeled SME information. M-GL models the relationship of these few labeled samples and large amounts of unlabeled samples through label diffusion processes on the smooth manifold for SME credit risk forecasting. Exploiting the large amounts of unlabeled samples to improve the accuracy of credit risk forecasting is also an advantage of M-GL. In the next subsection, we will investigate the influence of smoothness on the classifier’s performances.

5.4. The Impact on the Number of Nearest Neighbors

In our experiment, the number of nearest neighbors is set to 5 in graph construction. During label diffusion, we need a smooth graph. A proper value of will lead a stable and excellent performance. A large or a small will degrade the performance. When is small, it fails to capture the relationship of samples. On the contrary, when is large, mismatched vertices may be brought into the graph. This explains the trends of curves shown in Figures 79. After reaches an appropriate value, the performances begin to decline.

Because of the smoothness of the graph, we can obtain acceptable results through label diffusion. When we need to select partners from some SMEs, M-GL can give an appropriate credit ranking among the candidate enterprises.

6. Conclusions

The primary purpose of the paper is to assess SME credit risks in SCF by means of multiview graph-based learning. The credit risk evaluation system is constructed from three aspects: supply chain operation, SME financial indicator, and nonfinancial indicator. Different from traditional methods, the proposed methods treat the three views individually, aiming to give a quantitative comparison for which view is more important for credit risk assessment. The paper also provides a detailed analysis on the impact of different financial views for SME credit risk assessment in SCF. To sum up, M-GL shows its effectiveness to assess SME credit risks in SCF. Especially, M-GL has good performance in dealing with few labeled samples. In addition, SME financial indicator is the most critical view for SME credit risk assessment in SCF. Supply chain operation is a supplementary view, which is also a beneficial view to enhance the performance of assessment. Furthermore, M-GL not only uses the features from the three views but also exploits the relationships among samples. Therefore, the proposed M-GL for SME credit risk assessment is robust, which is applicable to prevent credit fraud by tampering with few data. When the bank or core enterprise needs to pick a partner, M-GL could give a relative sort order from the candidate enterprises.

Machine learning methods have shown their effectiveness in credit risk assessment. However, assessing SME credit risk in SCF is still a real challenge. Data collection and annotation is a natural barrier to apply the new machine learning method for SME credit risk assessment in SCF, such as deep learning approaches. Moreover, mobile manufacturing supply chain data are utilized in this paper, where differences between industries are neglected. In the future, we will investigate SME credit risk assessment in SCF with a large dataset for comprehensive analysis.

Data Availability

The data used to support the findings of this study are available from the corresponding authors upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities (no. 18cx04007B), Qingdao Social Science Planning Project (no. QDSKL2001044), and Shandong Social Science Planning Research Project (no. 18CJJJ13).