Abstract
The construction of credit evaluation index system of Chinese family farm and pasture is not only a theoretical problem but also of great practical significance. In this paper, based on the depth-weighted Bayesian theory and fuzzy mathematics, the improved depth-weighted fuzzy Bayesian hybrid algorithm model is proposed to solve the unbalanced problem of default status of family farm and pasture and to build the index system with the ability of three categories of default identification. In this paper, the characteristics of the first one is based on fuzzy set theory, the definition of fuzzy linguistic assessment of different default set, family ranches characteristic is converted to the corresponding index of pasting with triangular fuzzy mathematical model, and then through the inner method converting triangular fuzzy number into accurate output data to deal with the blur and uncertainty about the state of the fuzzy default transformation is realized. Second, based on the insensitive characteristic of ROC curve to skewness samples, the depth weighting of characteristic indexes in nondefault, low default and high-default states was completed by constructing multiclassification ROC curve, which solved the practical problem of sample imbalance in different default states of family farms and ranches, and selected the index system with significant discrimination ability for default states by integrating default identification ability.
1. Introduction
China is a big agricultural country, and agriculture is the basic industry of our country. According to statistics, as of 2020, the number of family farms and family ranches registered in the national directory system exceeded 3 million, and 117,000 demonstration family farms and family ranches at or above the county level were established. As an important part of the new type of agricultural management, family farms and family ranches has become the main force to promote the development of modern agriculture and an indispensable force in the process of agricultural modernization. However, the difficulty in financing is the main problem that has always troubled the sustainable development of family farms and family ranches. The development of family farms and family ranches cannot be separated from the support of the financial system, so the construction of a scientific and reasonable credit evaluation index system plays a fundamental role in solving this problem. If the credit index system is not properly selected, no matter what evaluation method is adopted, accurate credit evaluation results cannot be obtained. Therefore, the construction of credit evaluation index system plays a fundamental role in solving this problem. Therefore, the construction of credit evaluation index system for family’s farms and ranches is not only a theoretical problem but also of great practical significance. However, due to some problems existing in our family’s farms and ranches and the information asymmetry between commercial banks and financial institutions, a credit index system that matches the operating characteristics of family’s farms and ranches has not been established. As a result, most of the existing credit evaluation indexes of family’s farms and ranches are based on the index system of companies and enterprises, and there is no system suitable for the credit evaluation standard of family’s farms and ranches. Therefore, there are some problems in the evaluation process, such as the index cannot be determined and the index division is vague. How to establish an effective credit index system for family’s farms and ranches is an important problem that needs to be solved urgently.
2. Research status of family farm and pasture credit index screening model
2.1. Method-Based Index Screening
2.1.1. A Credit Index Screening Model Based on Mathematical Statistics
Zhou and Su [1] selected the index combination by invading the neighborhood component analysis in the credit risk field, and extrapolated the optimal index combination based on the maximum AUC of default prediction accuracy [1]. Yan et al. [2] screened out the indicators that can significantly distinguish loss given default through the nonparametric K-nearest neighbor discriminant method, making up for the shortage of the traditional research that only focuses on the indicators that can significantly distinguish the default status, and they used R-clustering to sort by criteria categories, ensuring that the indicators clustered in one category have the same economic meaning and data characteristics [2]. Traczynski [3] uses the average method of Bayesian model to select default forecast indicators, and proposes that only the ratio of total liabilities to total assets and the volatility of market returns are reliable default forecast indicators in the overall sample and individual industry groups [3]. Dahiya et al. [4] used two feature selection methods, chi-square test and principal component analysis, to screen important features, and proposed a hybrid bagging algorithm (FS-HB) based on feature selection, which improved the credit risk assessment method [4].
2.1.2. A Credit Index Screening Model Based on Artificial Intelligence
Zhou et al. [5] processed the German credit samples through the XGBoost algorithm, screened the credit indicators according to the importance obtained, and then analyzed the credit indicators and classification characteristics of the samples based on the random forest classification [5]. Wang et al. [6] selected 10 features which affect the credit risk of small and medium-sized enterprises from the 53 original features by single norm kernel feature selection, and proposed that the mortgage and pledge status is the most important factor that affects credit risk [6]. Bai et al. [7] studied the characteristics affecting farmers’ creditworthiness through fuzzy rough sets and fuzzy C-means clustering method. The results show that the characteristics related to education and skills can have a significant impact on farmers’ creditworthiness [7]. Li AND Yang [8] used SVM model to conduct the first round of index screening, and conducted the second round of index screening based on the principle of minimum square sum of deviation of cluster variables, and finally constructed a relatively effective credit evaluation index system [8].
2.1.3. Credit Index Screening Model Based on Other Methods
Kou et al. [9] proposed a new allocation rule combined with VEGE to solve the multiobjective decision optimization problem [9]. Fallahpour et al. [10] used sequential floating forward selection algorithm as a wrapper technique to determine the optimal subset of features to ensure that the selected index combination has strong default identification capability [10]. Based on information theory, Hu et al. [11] proposed a feature selection method of dynamic association and joint mutual information maximization (DRJMIM) to screen the feature with the largest amount of information. The comparison results of five competitive feature selection methods show that the method obtains the highest classification accuracy while selecting a small number of features [11]. Hai et al. [12] deleted the indicators of information redundancy through correlation analysis, and then used significance discrimination to screen out indicators that can significantly distinguish farmers’ default status [12]. Nikolic et al. [13] obtained all possible combinations of model indicators by running the logistic regression model step-by-step forward or backward through the indicators, and selected the model with the highest predictive power and containing 8 financial ratios according to the GINI performance on the validation data set [13].
2.2. The Second Is the Study of Family Farm and Pasture Credit
Ni and Zhang [14] divided China’s new agricultural operation subjects into three types, namely family operation, cooperative operation and enterprise operation, according to the different characteristics of operation subjects. Based on the principles of comprehensiveness, systematicness and comprehensiveness, the credit evaluation index system is successively constructed around the above three business entities [14]. Shen and Lu [15] based on 5 c criterion system and the new type of agricultural management main body, marketization, specialization and scale moderation intensive characteristics, preliminary constructs the evaluation index, through partial correlation coefficient method and the Probit regression model, delete a significant sex difference of ability to identify, to ensure that the final build 18 indexes has significant identification ability and with the new default Consistent with credit characteristics of agricultural operating subjects [15]. When Basak [16] studied the financing difficulties of agricultural operators in Howla District of West Bengal, he revealed that the number of family population, bank deposits and the relationship between banks and farmers were important factors affecting the availability of loans [16].
There are three problems in the existing credit index screening model: first, although the previous credit index screening model can distinguish the credit status of default and nondefault of the sample, due to the fuzziness of the characteristics of high default and low default, it does not explain the characteristics of high default and low default of the sample, ignoring that high-default loss and low default loss are a kind of credit characteristics with different properties. Second, the existing credit index selection models are all based on the premise of sample balance. In reality, the default samples of family’s farms and ranches are not balanced. The number of default samples is much smaller than the number of nondefault samples. How to solve the problem of sample imbalance is an extremely important problem faced by the existing credit index selection research. Third, most of the existing researches only focus on the qualitative research of family’s farms and ranches, but lack the quantitative analysis of the credit status of family’s farms and ranches. As a result, the credit evaluation system of family’s farms and ranches has no choice but to use the index system of companies and enterprises, which leads to the problems that the index cannot be determined and the index division is fuzzy in the evaluation process. Therefore, how to construct a credit index system with default judgment ability for family’s farms and ranches is an important problem that needs to be solved urgently.
Aiming at the problems that emerge in the existing credit index screening model, this paper constructs a family farms and family ranches credit index screening model based on depth-weighted fuzzy Bayesian, and carries out empirical analysis on the data from the Inner Mongolia family farms and family ranches survey data, a commercial bank credit database, and the city statistical yearbook. From this, this paper finally puts forward a new improved depth-weighted fuzzy Bayesian algorithm for nonequilibrium samples, which can define and distinguish default states in different degrees, and improves the existing multicategory credit index screening model.
The article is organized as follows. Section 2 discusses the principle of building the family farming and pasture credit index screening model, Section 3 describes the method of building the family farming and pasture credit index screening model, Section 4 explains the specific process of building the family farming and pasture credit index screening model and tests the model, and Section 5 draws the conclusion and summarizes the innovation of the research.
3. Depth Weighted Fuzzy Bayesian Model Principle
3.1. The Difficulty of the Problem
Difficulty 1. How to solve the fuzziness problem corresponding to the default state caused by the fuzziness phenomenon and realize the deterministic transformation of multiclass uncertain variables.
Difficulty 2. Which method should be adopted to solve the unbalanced problem of default state of household farm samples?
3.2. The Method to Solve the Difficulty
The solution to the first difficulty is to standardize the sample data of family’s farms and ranches, convert the uncertain variables into definite triangular fuzzy numbers based on fuzzy theory, and then and then converting the fuzzy numbers into precise data defuzzification by inner method, so as to realize the definite conversion of fuzzy default status, and calculate the prior probability and conditional probability to judge the sample category. The solution to the first difficulty is shown in Figure 1.

The solution to difficulty 2: based on the fact that the importance of the index characteristics of different family’s farms and ranches is discrepant and the sensitivity curve is not sensitive to unbalanced data, the weights corresponding to different indexes are obtained through the analysis of experimental samples, the selection and weighting of the important index characteristics of family’s farms and ranches are realized, and the problem of sample skewness of family’s farms and ranches is solved. The solution to difficulty 2 is shown in Figure 2.

3.3. Credit Index Screening Model Construction Principle
The construction principle of fuzzy Bayesian family farms and family ranches credit index screening model based on depth weighting is shown in Figure 3.

4. Depth-Weighted Fuzzy Bayesian Model
4.1. Fuzzy Bayesian Model
4.1.1. Naive Bayes
Let's say: is the posterior probability of family farms and family ranches X, the family farms and family ranches data set is , is the conditional probability, is the prior probability, V is the total number of indicators, and C(x) is the classification decision function.
The meaning of model (1): it represents the process of solving the classification of family farms and family ranches samples under classical conditions, and the classification results of family farms and family ranches based on Naive Bayes classifier without considering the fuzziness of variables and the sample disequilibrium problem.
4.1.2. The Fuzziness of Family’s Farms and Ranches
Step 1. Triangular fuzzy numbers. As a mathematical model representing the membership degree distribution of each sample attribute in the data set, triangle fuzzy number can be applied to credit evaluation to reflect the uncertainty and fuzziness of default risk. It is a simple and efficient mathematical method that can be used to explain fuzzy phenomenon and describe fuzzy set. Let us say is a triangular fuzzy number, is the lower line of the fuzzy number, is the most probable value, is the upper line of the triangular fuzzy number, and μ(x) is the corresponding membership function. The meaning of formula (2): indicates the triangular fuzzy number and membership function corresponding to the sample data of the family farms and family ranches, which is used to realize the fuzzification process of the family farms and family ranches. Step 2. Construction of fuzzy language evaluation set for default risk.
First, a language assessment system is established for the possibility of family farms and family ranches default. Due to the fuzziness of sample data in family farms and family ranches, a fuzzy language assessment set is set through fuzzy language assessment such as nondefault, low default, high-default, etc. The fuzzy assessment set of relevant attribute indicators: A = {nondefault, low default, high-default}, and the fuzzy language assessment set is converted into triangular fuzzy numbers for classification. The construction of fuzzy language assessment set and corresponding triangular fuzzy numbers is shown in Table 1.
4.1.3. The Family’s Farms and Ranches Is Defuzzified
Let be a triangular fuzzy number, α, β, γ, be a parameter, Ix be a definite value corresponding to the fuzzy number, Iy be a corresponding membership degree, and the process of deblurring is as follows.
The meaning of model (3): the triangular fuzzy number defuzzification of uncertain variables in family’s farms and ranches is converted into accurate variables, and the interval is divided to discretize the continuous variables, so as to classify the samples by using the naive Bayesian classifier and realize the deterministic transformation of fuzzy default state.
4.2. Depth-Weighted Bayesian Model
This paper uses the characteristic that the ROC curve is not sensitive to unbalanced data, constructs a multiclass average ROC curve based on the sample characteristics of nondefault, low default and high-default family’s farms and ranches, and realizes the sample weighting of the index characteristics of family’s farms and ranches based on AUC(i) [17–20]. Step 1: drawing of multiclassification ROC curve Under each of the existing nondefault, low default and high-default categories, the probability that the required test sample is that category can be obtained. According to different categories, the false positive rate (FPR) and true positive rate (TPR) under each threshold are calculated to draw one ROC curve, so that a total of three ROC curves can be drawn. Finally, the three ROC curves are averaged to obtain the final ROC curve. Step 2: depth-weighted Bayesian model
Let's say: c(x) is the depth-weighted Bayesian classification decision function, is the conditional probability, n(xi|c) is the number of xi in class c, n(c) is the number of all samples in class c, ni is the number of different eigenvalues in the I-th feature index, and Wi is the depth-weighted weight of the I-th feature.
The meaning of model (4): by weighting the indicators of the three types of family’s farms and ranches under different default conditions and using the characteristic that the ROC curve is not sensitive to biased data, the problem of unbalanced family farms and family ranches samples is solved, and the classification of family farms and family ranches samples under unbalanced conditions is realized.
4.3. Index Screening
4.3.1. The Credit Index Identification Ability
Suppose: A-default judgment ability of all indicators (percentage of correct judgment for high-default family’s farms and ranches, low-default family’s farms and ranches and nondefault family’s farms and ranches), which can be expressed as credit judgment ability of all selected indicators for family’s farms and ranches; A0 represents the percentage of all nondefault samples judged correctly; A1 represents the percentage of all low default samples judged correctly; A2 represents the percentage of all high-default samples judged correctly. The formula for the credit identification ability of all indicators is:
The meaning of formula (5): it can be used to describe the credit appraisal ability of all indicators of audition under different default conditions for all samples of family’s farms and ranches.
4.3.2. The Identification Ability after Deleting Item j
Set Aj-the default judgment ability of other indicators except the J-th indicator; - Percentage of nondefault samples of other indicators other than the J-th indicator that are judged correctly; -Indicates the percentage of low default samples for indicators other than the J-th indicator that are judged to be correct. The formula of credit identification ability after deleting item j is
Meaning of formula (6): credit identification ability of all remaining indicators after deleting the j-th indicator.
4.3.3. The Influence of Credit Indicators
Set: dj-the degree of impact of the J-th indicator on the credit evaluation results; Aj-the credit appraisal ability of the remaining indicators except for the j indicator to all family’s farms and ranches; A-all indicators of credit worthiness for all household farms. The formula for the influence of the indicator is
The use of formula (7): reflects the degree of influence of the J-th indicator on the credit evaluation results. If dj is a real number greater than 0, indicating that the remaining indicators have weaker default identification ability after deleting the J-th indicator, then the indicator should be retained; otherwise, the indicator should be deleted.
5. Empirical Study
5.1. Indicator System and Sample Data
This paper selects 1814 household farm credit data from Inner Mongolia, China, and sets four criteria levels as shown in column (2) of Table 2. A total of 44 household farm credit evaluation indicators are obtained from the audition, among which 1,586 are nondefaulting household farms, 190 are low-defaulting household farms, 38 are high-defaulting household farms. Line 45 of Table 2 indicates the real credit default status of the household farm, of which 0 indicates that the household farm credit status is nondefault, 1 indicates that the household farm credit status is low default, and 2 indicates that the household farm is high-default. The sea election indicators are derived from literature [21, 22], and the sample data of family’s farms and ranches are shown in Table 2.
5.2. Sample Fuzzification and Deblurring
5.2.1. Data Obfuscation
Due to the fuzziness of the family farms and family ranches data, the definitions of nondefault, low default and high-default are uncertain. Therefore, the set fuzzy language evaluation set is converted into triangular fuzzy numbers for sample classification. The fuzzy language evaluation set and triangular fuzzy number of family’s farms and ranches are shown in Table 3.
5.2.2. Data Deobfuscation
The fuzzy language evaluation set and the corresponding triangular fuzzy numbers are deblurred by the inner method so as to obtain the corresponding accurate values and realize the transformation of the sample data from fuzzy to definite. The fuzzy data and corresponding accurate values of the household farm samples are shown in Table 1.
5.3. Depth Weighted Fuzzy Bayesian Classification
5.3.1. Calculation of AUC(i) for Three Categories
Based on the three-category problem of family farms and family ranches and pasture in this paper, the weight of the depth-weighted sample index is calculated by SPSS. According to the way of drawing the ROC curve in the two categories, one ROC curve is drawn for each category. Therefore, three ROC curves are drawn, and the average of the three ROC curves is taken to obtain the three-category ROC curve corresponding to the index in this paper. The ROC curve is shown in Figure 4, and the AUC(i) corresponding to each index is obtained as the weight value and applied to the calculation of conditional probability. The calculation results of the index AUC(i) are shown in Table 4, column (4).

5.3.2. Depth Weighted Fuzzy Bayesian Classification
The new attribute values obtained after the data fuzzification, deblurring and discretization of the family farms and family ranches samples are applied to the depth-weighted Naive Bayes algorithm, and the categories of the family farms and family ranches test samples are obtained according to the maximum posterior rule by calculating the conditional probability and the prior probability of the model. The calculation results are shown in line 46 of Table 4.
5.4. Index Screening
5.4.1. All Indicators Credit Indicator Identification Ability
The classification result of the family farms and family ranches is obtained through the depth-weighted fuzzy Bayes. The result is shown in the 46 lines of Table 4. The calculation result is substituted into equation (5) to calculate the credit identification ability A of all the sea selection indicators for the family farms and family ranches, and the calculation result A = 0.85.
5.4.2. Identification Ability after Deleting Item j of the Index
All the index data after the deletion of the j index are subjected to a depth-weighted fuzzy Bayesian hybrid classifier to obtain three classification results, and the classification results are substituted into the formula (6) to obtain the credit identification ability Aj of all the indexes except the j index for the family farms and family ranches. The results of Aj calculation are shown in column (4) of Table 4.
5.4.3. Index Influence dj Calculation
dj is the difference between the credit identification ability of all indicators and the identification ability of all remaining indicators after deleting the j indicator, which is expressed as the impact ability of the j indicator on credit evaluation. If dj is a real number greater than 0, indicating that the default identification ability of the remaining indicators after deleting the j indicator is weaker, then the indicator should be retained, otherwise, the indicator should be deleted. Substituting A and Aj calculated above into equation (7), the influence of each index is calculated. The calculation results of dj are shown in column (5) of Table 4.
5.5. Construction of Credit Index System and Effectiveness Test
After fuzzy processing and index selection of household farm credit indicators, a credit index system of household farm credit with 16 credit indicators was finally constructed, as shown in Table 5, column (3).
Through SPSS software, the effectiveness analysis of constructing the index system is carried out on the samples of high and low default states. The ROC curve is shown in Figure 5. The blue curve in the figure represents the identification results of high and low loss given default of 16 indexes finally screened out based on the depth-weighted fuzzy Bayesian model. The area under the blue curve AUC = 0.586, AUC > 0.5, which indicates that the depth-weighted fuzzy Bayesian model has good effect in distinguishing high and low loss given default, and the constructed credit evaluation index system for family’s farms and ranches has strong default identification ability and is effective.

6. Conclusion
6.1. Main Conclusions
In this paper, the fuzzy Bayesian mixed three-classification model with depth weighting is used to obtain 16 family farms and family ranches credit evaluation indexes with significant discrimination ability for nondefault, low default and high-default states, which are represented by five different levels: educational background (basic situation), skill status of lenders and their families (repayment ability), the amount of bank loans that lenders have not repaid (repayment willingness), whether there is guarantee (guarantee for guarantee), Engel coefficient (macroenvironmental factors).
6.2. Main Contributions
(1)Based on the theory of fuzzy mathematics, the uncertain variables are fuzzified by the triangular fuzzy numbers, and the triangular fuzzy numbers are fuzzified randomly by using the inner method, and different types of default situations are defined and interpreted, thus realizing the change of fuzzy default status of family’s farms and ranches from fuzzy to definite.(2)Based on the characteristic that the ROC curve is not sensitive to biased samples, the depth weighting of characteristic indicators under nondefault, low default and high-default conditions is completed by constructing a multiclassification ROC curve, which solves the practical problem of unbalanced samples in family’s farms and ranches. The index system with significant discriminant ability for default status is selected based on default discriminant ability.6.3. Policy Suggestions
In order to ensure the sustainable development of family farms and family ranches, effectively alleviate home loans difficult, the current situation of the financing difficulties, the work family farms and family ranches, identification and evaluation of credit risk monitoring, commercial Banks at all levels need to change the past mistake to use the index system of companies, to construct our country the characteristic of the family farms and family ranches loan credit rating index system. On the other hand, in order to promote the national agricultural benefit policy and strive to ensure that more family farms and family ranches get loans, the government or regulatory authorities should create conditions and guide banks at all levels to increase the proportion of loans to family farms and family ranches. From the perspective of banks, the credit decision-making model that can guarantee the bank’s target profit and maximize family farms and family ranches may be one of the effective ways to improve the financing difficulties of family farms and family ranches. And that is what we are going to focus on [23, 24].
Data Availability
The empirical sample of this paper is the credit data of 1814 family farms and ranches in Inner Mongolia, which comes from the database of a Commercial bank in China.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The research was supported by the National Natural Science Foundation of China (72161033), Natural Science Foundation of Inner Mongolia Autonomous Region of China (2020MS07009), Inner Mongolia Science and Technology Project of China (201605053), and Inner Mongolia Autonomous Region Graduate Scientific Research Innovation Project of China (S20210213Z).