Abstract

Credit-risk prediction is one of the challenging tasks in the banking industry. In this study, a hybrid convolutional neural network—support vector machine/random forest/decision tree (CNN—SVM/RF/DT) model has been proposed for efficient credit-risk prediction. We proposed four classifiers to develop the model. A fully connected layer with soft-max trained using an end-to-end process makes up the first classifier and by deleting the final fully connected with soft-max layer, the other three classifiers—a SVM, RF, and DT classifier stacked after the flattening layer. Different parameter values were considered and fine-tuned throughout testing to select appropriate parameters. In accordance with the experimental findings, a fully connected CNN and a hybrid CNN with SVM, DT, and RF, respectively, achieved a prediction performance of 86.70%, 98.60%, 96.90%, and 95.50%. According to the results, our suggested hybrid method exceeds the fully connected CNN in its ability to predict credit risk.

1. Introduction

Banks are financial institutions created in accordance with the rules and regulations of each country to lend, borrow, issue, exchange, receive deposits, safeguard, or handle money [1]. The banking sectors capacity to maintain a healthy influx of capital is essential to its ability to properly support a country’s economic development. The risk of credit, which is exposed to a positive inflow of resources and the problem of a high-default rate in credit repayment, is one of the numerous elements that have an impact on the health of the banking sector. Credit risk, which is essentially the risk that borrowers would not be able to meet their contractual commitments and would not be able to repay bank loans on time or in whole, is the most important aspect [2, 3]. The result could be a reduction in the bank’s capacity to achieve its business goals or a direct loss of capital.

Banks will have difficulty recovering disbursed credits if there is no reliable method to assess credit risk. A country’s overall economic development, the banking industry, and the financial sectors are all at-risk from a declining rate of credit repayment. A nation’s banking sector must successfully and consistently mobilize its financial resources for economic growth.

The ability to develop and operate an efficient credit prediction mechanism that ensures the probability of credit repayment and reduces the degree of default is thus one of the key difficulties in the banking industry. It is a difficult task for the financial sector and the banking industry to develop and implement such a mechanism because it depends on the subjective expertise of bank professionals.

The bank’s credit analysts face a practical challenge when attempting to differentiate between a high-risk borrower and a credit-worthy customer who pays back the debt accurately and on time. The ability of banking institutions to accurately differentiate between a credit-worthy borrower and a delinquent one has a significant impact on both their ability to operate economically and to sustain a positive inflow of resources in the banking system of the national economy. The purpose of the credit evaluation procedure is to determine whether or not a given credit application will be able to repay the borrowed monies. And such a prediction is produced by following a few guidelines, but more importantly, by utilizing the bank’s credit policies and the expertise-based knowledge of the sector experts.

In recent years, artificial intelligence has tremendously benefited the financial industry in model development and analysis and prediction based on the historical information of the customers.

In this study, we use a hybrid approach of machine learning (ML) to predict credit risk. The aforementioned studies have done extensive research in the area of credit prediction, but to the authors’ knowledge, no work has been done utilizing hybrid convolutional neural network—support vector machine/random forest/decision tree (CNN—SVM/RF/DT) algorithms. In our analysis, we consider the advantage of the convolutional layer which separates the historical credit customer data from the various borrower characteristics, and the pooling layers to minimize the size of the feature maps. The pooling layer consequently reduces the amount of computation carried out within the network as well as the number of parameters that must be derived from the credit data. The feature maps that are produced by following functional a number of convolution and pooling layers to the credit data are flattened into a 1D array and used as inputs for the SVM, RF, and DT.

In predicting a bank’s credit risk, we suggest a CNN hybrid with the SVM, RF, and DT for the following reasons:(i)CNNs are well-known for their capacity to learn meaningful characteristics from input automatically. By adding CNNs into the hybrid models, the authors can use their feature extraction skills to detect important patterns in credit data, potentially leading to more accurate credit prediction.(ii)The nature of SVMs makes them effective on datasets with multiple features and efficient in situations where the number of features exceeds the number of data points. SVMs use a subset of training points in the decision function called support vectors, which reduces the memory usage.(iii)The nature of RF nonlinear patterns can be easily captured, variable selection can be performed more effectively, missing value prediction can be accomplished very effectively using a feature engineering algorithm, there are no distributional assumptions made, and column normalization is not required, so in this study we chose RF to hybridizes with CNN.(iv)The DT’s characteristics include its capacity to handle missing data and predict effectively by including a subset of features at each node’s splitting point.

This paper is organized into the following sections. Section 2 reviews the findings of pertinent studies that used ML and deep learning algorithms to predict credit risk. In Section 3, the preprocessing of the credit data is shown, and the techniques used-including CNN, SVM, RF, DT, hybrid CNN-SVM/RF/DT, and evaluation metrics-are thoroughly discussed. Further information about the experimental findings and a comparison of CNN hybrid with SVM, RF, and DT algorithms are covered in Section 4. In Section 5, the conclusion is delivered.

Credit is one of the most crucial elements of banks and other financial institutions. Credit can also be defined as unanticipated events that typically take the shape of either assets or liabilities [4]. The ability of the borrower to repay the credit loan on time is a key factor in determining credit risk, the primary risk that commercial banks currently face. Whether the borrower can pay back the credit loan will decide much of the problem commercial banks are currently facing [2].

Better predictive performance is linked to the use of new ML algorithms for credit default prediction; however, new model risks are also created, particularly with regard to the regulatory evaluation process. The confusion about how managers might evaluate these risks is frequently mentioned in the recent industry surveys as a potential obstacle to the invention [5]. Put out a brand-new framework to quantify model risk modifications and contrast the effectiveness of various ML techniques. They first use the internal ratings-based technique to determine up to 13-risk factors, which they then divide into three major categories: statistics, technology, and market conduct, in order to handle this difficulty. Second, using natural language processing and risk terms based on expert knowledge, they compile a number of rules and regulations pertaining to three possible application cases-regulatory capital, credit scoring, or provisioning and compute the weight of each group according to the frequency of their mentions. Finally, they put their approach to the test by quantifying certain proxies for a selection of risk components they believe to be represented using well-known ML models for credit risk and a publicly accessible database. The quantity of hyperparameters and the consistency of the forecasts are used to calculate statistical risk. The technological risk is evaluated using the algorithm’s transparency and the ML training method’s latency, while the market conduct risk is measured using the time it takes to run a post hoc methodology (SHapley Additive exPlanations) to interpret the results. They discover that statistical risks are more significant for regulatory capital, but risks associated with the market conduct and technology are more significant for credit scoring. Five of the most well-liked ML algorithms-RF, XGBoost, CART, Lasso, and multilayer perceptron, were examined using their framework to assess the model risk. They can determine which of the ML models has a superior risk-adjusted performance by contrasting each model’s model risk with its corresponding AUC–ROC prediction performance.

Commercial banks are very important for the growth of society and the economy. Therefore, it is crucial from both a theoretical and practical standpoint to appropriately assess their credit risk and set up a credit-risk prevention mechanism. Using a combination of a BP-neural network with a mutation genetic algorithm, [6] focuses on the credit-risk assessment of the commercial banks, uses the neural network as the primary modeling tool of the credit-risk assessment of commercial banks, and uses the mutation genetic algorithm to optimize the main parameter combination of the neural network in order to enhance the neural network’s efficiency. Following the validation of several assessment models, the accuracy of the model created in their work is greater than 65%, and the evaluation outcomes improved by the mutation genetic algorithm are more than 85% acceptable. The accuracy of the credit-risk assessment utilizing neural network technology has increased by more than 10% when compared to the accuracy of the conventional credit scoring approach, which is only approximately 50% accurate. Thus, they established that the optimized method performs better than the CNN approach. They come to the conclusion that it has significant theoretical and practical implications for the development of the commercial banks’ credit-risk prevention system.

ML algorithms are applied worldwide to carry out default risk prediction in the big data era. Repetitive features and unbalanced datasets are the two key issues that can hinder the effectiveness of ML models. From the standpoint of these challenges, [7] examines the feature selection order as well as alternative balancing ratios. Thus, they first obtain 32-derived datasets with different ratios of balance and feature combinations for each dataset using data rebalancing and feature selection. In order to choose the best-derived dataset with the ideal balance ratio and feature combination, they secondly offer a comprehensive metric model based on multi-ML algorithms. Their study makes two contributions. The ideal balance ratio is initially identified through classification accuracy in order to address the issue in the prior research that samples are imbalanced or the balance ratio is 1 : 1. This guarantees the classification model’s accuracy as well. Second, they suggested a complete metric model built on a ML algorithm, which can concurrently select the best features and determine the ideal balance ratio. Their experimental outcomes show that their strategy might greatly improve CNN’s performance, and CNN outperforms the other four popular ML models on four benchmark datasets for predicting default risk.

Financial crises are very likely to be brought on by poor decision-making in the financial institutions. Numerous studies conducted recently have shown that artificial intelligence systems can be employed as alternatives to traditional credit rating methodologies [8], developed a prediction model for credit approval that combines feature selection, instance selection, and classifiers. The first step in feature selection is the usage of a measure (gain ratio). In order to cluster the training dataset into k-clusters beforehand, a clustering algorithm (EM) is used. Finally, k-DT classifiers for k-clusters of examples are constructed using the C4.5 technique. The EM clustering approach is used to decide which cluster-based DT should be employed in order to forecast the class labels of previously unknown records. Their proposed CBDT strategy is superior to the other five methods (DT, MLP, NB, RF, and SVM) in the measures (F1, Accuracy, and CostEffect), according to the experimental findings produced using the survey data. Additionally, they suggested three methods (MLP, NB, and SVM) be combined with two hybrid approaches (feature selection and instance selection) to create three new ways (CBMLP, CBNB, and CBSVM). They claim that their proposed CBDT methodology outperforms the other five approaches (DT, MLP, NB, RF, and SVM) in the measurements (F1, Accuracy, and CostEffect) of their experiments, which were conducted using the survey data. In addition, they suggested creating three new methods (CBMLP, CBNB, and CBSVM) by combining two hybrid approaches (feature selection and instance selection) with three existing methods (MLP, NB, and SVM). From their experimental findings, they recommend that managers of the banking and auditing sectors might take into account hybrid ideas, feature, and instance selections, when establishing their information systems for credit approval. As a result, this type of technology can assess client dependability with greater precision, significantly lowering the cost of bad debts.

The aforementioned studies have done extensive research in the area of credit prediction, but to the authors’ knowledge, no work has been done utilizing hybrid CNN–SVM, CNN–RF, and CNN–DT algorithms. For the purpose of predicting credit risk, we create a hybrid ML algorithm and compare it against a fully connected CNN. We also discuss how each model’s various learning parameters affect it.

3. Materials and Methods

3.1. Dataset

We implement our proposed model for predicting credit risk using data from a local bank in Ethiopia. The credit data comprises 7,631 examples of creditworthy applicants who are nondefaulter and 6,823 applicants’ of defaulter.

The input features used to predict credit risk are summarized in Table 1 below.

Table 2 summarizes the numerical input features used in this model. The distribution of all the numerical inputs is not symmetrical, which shows that the mean and the median of all the numerical input features are different. A second quartile value of 39 for age implies that 50% of customers are 39 years old, while the remaining 50% are more than 39 years old. The age distribution is positively skewed, indicating that most consumers are younger than the average and that a high number of customers are forced to the right side of the distribution. In addition, age is a positive excess kurtosis which shows age distribution is leptokurtic (heavy tails on either side), indicating high outliers. Total income is negatively skewed, which shows most customers’ total income is lower than the mean value.

Missing values are frequently observed during data collection. Missing values reduce the amount of data that can be processed, which lowers the study’s statistical power and, eventually, lowers its ability to draw findings that are trustworthy. Additionally, it makes the data less useful and significantly biases the outcomes. In order to overcome these difficulties, we employ several methods for addressing missing values depending on the variables. Remove the rows with missing values if there are only a few records, fill in the missing values for continuous variables using the median value, and fill in the missing values for categorical variables using the mode value.

The input data, which are the numerical values of the attributes, are separately normalized to range from “0” to “1”. Finding the highest or maximum value for each attribute across all dataset cases can help you achieve this. Divide the highest value by this number to get all other values contained within that attribute.

In our study, we divided the dataset into two parts: the training dataset and testing dataset. We tested with a range of learning techniques and training-to-testing data ratios in order to identify the ideal training-to-testing ratio for the application of the credit-risk prediction model. In our proposed model we used search approach, in which, different parameter values were considered and fine-tuned throughout testing to select appropriate parameters by exhaustively searching through all possible combinations.

3.2. Convolutional Neural Network

The CNN, also known as ConvNet, is a special type of neural network that has a deep feed-forward architecture and amazing generalizing abilities compared to the other networks with FC layers [9]. It can learn highly abstracted features of objects, particularly spatial data, and can identify them more effectively. It is a special kind of feed-forward neural network [10], which utilizes convolution, ReLU, and pooling layers [11]. Every convolution layer comprises of many kernels of the same size, and they are all used to extract features from the input [12]. The pool layer comes next, performing (average or max) pooling, and sending the output to the prediction or fully connected layer [12]. There are multiple layers (multibuilding blocks) in the CNN architecture. Below, we go into further detail about the function of each layer in the CNN architecture [13].

3.2.1. Convolutional Layer

The most crucial component of the CNN architecture is the convolutional layer. It is made up of a set of convolutional filters (kernels), a feature map, and input data. Convolution requires a number of spatial variables, such as the size of the kernels (N), stride (S), and padding (P), to produce activation maps of a particular size. The kernel will run a convolution operation on a portion of the input that corresponds to the size of its window in order to produce results in its activation map [11]. The stride, or the distance between two subsequent kernel points, serves as the basis for the convolution operation [14]. It controls and adjusts the amount of movement across the input data. Zero padding is the practice of encircling a matrix with zeroes (adding zeroes to the input boundaries), which helps to maintain features that are present at the margins of the original matrix and to regulate the size of the output feature map. Convolutional filters are applied to 1D input using a 1D convolutional layer. By moving the filters along the input, computing the dot product of the weights and the input, and then adding a bias term, the layer convolves the input [15]. Figure 1 shows the functionality to of 1D convolutional layer. Sliding convolutional filters are applied to 1-D input via a 1-D convolutional layer. By moving the filters along the input, computing the dot product of the weights and the input, and then adding a bias component. The input layer’s and outputs’ strides will all have the same weights because the kernel size is set to 3. Weights are added to the input values in the kernel window .

3.2.2. Pooling Layer

A pooling layer is frequently interspersed between subsequent convolutional layers in a CNN architecture [10, 16]. The pooling layer can minimize the dimension of the feature map by sliding a filter of a particular size with a certain stride size and computing the maximum or average of the input, which speeds up the computation by lowering the number of trainable parameters [17]. The 1D max pooling block computes the maximum in each distinct window by moving a pool (window) of a defined size over the incoming data with a set stride. To provide the inner convolutional layers access to more of the original vector’s information, max-pooling layers are added after one or more convolutional layers. If convolutional layers are viewed as feature detectors, max pooling only keeps the “strongest” value of a feature inside the pooling rectangle. A 1D average pooling layer separates the input into 1D pooling regions and then computes the average of each region to perform down sampling. The layer pools the input by moving the pooling regions along the input horizontally. Figure 2 illustrates the computation of the maximum and average pooling layers. By pooling layers, one can reduce the size of the feature maps, as shown in Figure 2. The amount of network computation and the number of parameters that need to be learned are thereby decreased. Additionally, it provides a summary of the characteristics found in a particular area of a feature map created by a convolution layer. The pooling size prior to pooling is three, while the map size is nine. After reducing the output values to three, the maximum values for the following network layer will be chosen. Max pooling assists in lowering over-fitting by providing an abstracted form of representation.

3.2.3. Dropout Layer

A dense layer or completely connected layer can be used as an input after flattening all of the pooled feature maps into a single vector. The flatten function reduces the multidimensional input tensors to a single dimension, allowing you to effectively model your input layer and construct your neural network model before feeding those data to each and every model neuron.

3.2.4. Fully Connected Layer

CNNs, which have had great success in the financial sector, feature a very unique and important component called the fully connected layer. Convolution and pooling are the first steps in the CNN method, which divides the input into a vector of features and then analyzes each feature individually. The process results in a completely integrated final decision. A single vector is created by reshaping (flattening) the output of the network’s preceding levels. They each represent the likelihood that a specific feature is a class label. The feature map is used as an input, and weights are used to determine the proper label. The final probabilities for each label are provided by FC’s output layer.

3.3. Support Vector Machine

SVM, the most effective supervised ML algorithm [18], is used for classification, regression, and outlier detection. A SVM performs classification by constructing an N-dimensional hyper plane that optimally separates the data into two categories [19, 20]. A margin is a distance between any two lines on the closest data points, and this closest data point to the hyperplane is the support vectors. The margin is determined as the perpendicular distance between the line and the nearest data points or support vectors [21, 22].

Consider the credit customers dataset, there are two classes of target variables in : nondefaulter and defaulter. Assume that the nondefaulter and defaulter customer’s classes are linearly separable, which means that a hyperplane exists such that the nondefaulter belongs to one half-space and the defaulter belongs to the other [23].

To be more specific, there exists a vector and a scalar such that

We need to find a hyperplane which has a maximum margin for and offset scalar for the prediction of the credit customer’s as nondefaulter and defaulter such that

Due to the complexity of the constraints, it is challenging to directly solve this problem. The Lagrangian duality theory is the preferred mathematical technique for deriving a solution to this problem [24, 25]. The goal is to identify a maximum margin hyperplane that aids in appropriately predicting credit risk depending on the classes, either defaulter or nondefaulter. A Lagrange multiplier can be used in these circumstances to enforce the constraint, as shown below:

In order to determine the stationary (saddle) point of Equation (3), the following conditions must be satisfied:

It should be emphasized that won’t be equal to zero unless the accompanying input data, , is a support vector [26, 27]. The general equations of the SVM are constructed by substituting Equations (4) and (5) into Equation (3). These results in the maximum hyperplane required to predict the two classes as either nondefaulter or defaulter using a linearly separable credit data.

We assume that the nondefaulter and defaulter classes of our target variable in the credit dataset are not linearly separable. In other words, using Equation (1), it may be difficult to separate data linearly because of the homogeneity of a few features in the credit dataset. To solve that problem, let’s utilize the slack variables and the solution of Equation (3) will bewhere is the penalty parameter of the error term. It handles the tradeoff between accurately identifying the training points and smooth decision boundaries.

3.4. Random Forest

The RF algorithm creates a forest in the shape of a collection of DTs, increasing randomization as the trees grow. The technique seeks for the best features from the random subset of features when splitting a node, adding more diversity and improving the model. RF is an ensemble of T DTs, which is typically created by building many DTs in order to improve performance [28]. Through the use of tree ensembles and random selection of input variables, RF can minimize overfitting issues and increase variety [29].

The RF classification method is demonstrated in Figure 3 in the following ways [30, 31]: (i) using the initial data, create bootstrap samples. (ii) Grow an unpruned classification tree for each of the bootstrap samples with the following modification: at each node, select the best split among the predictors instead of the best split among all predictors. (iii) By aggregating the trees’ predictions (i.e., majority votes for classification), predict new data. Due to the nature of handling missing data, predicting effectively without hyperparameter tuning, solves overfitting problems in the DFs, and at the node’s splitting point in every RF tree, a subset of feature, in this study we chosen RF to hybrid with the 1D-CNN.

3.5. Decision Tree

One of the most used supervised learning algorithms is the DF, which works by building a training model that can be used to predict the class or value of the target variable by learning straightforward decision rules inferred from historical data (training data) [32, 33]. The primary objective in DTs is choosing the best attribute from the dataset complete features list for both the root node and the subnodes. The two methods for evaluating attribute selection are information gain and Gini index.

Entropy, which is used to train DTs, is essentially the variance of the data. Information entropy for a credit dataset with two classes, which are defaulter and nondefaulter is given bywhere is the probability of randomly picking an element of class .

Information gain is a metric used to describe changes in entropy value following the splitting or segmentation of the dataset based on an attribute. It explains how much information a feature or characteristic gives us, and based on the knowledge gathered, node splitting, and DT building are carried out. A node or attribute with the highest value of the information gain is split first in the DT, which always aims to maximize the value of the information gain.

The Gini index is used as a metric of impurity or purity when developing a DT using a classification method [34]. A low-Gini index value should be preferred over a high-Gini index value if possible. Only binary splits are produced by it, and the Gini index is used in the classification process to produce those binary divides.

3.6. Hybrid CNN-Machine Learning Algorithms

1D convolution layers, pooling layers, dropout layers, and activation functions are used by CNN to handle the 1D data. A CNN’s convolutional layer, which forms its central component, is also where most processing takes place. It calls for input data, a filter, and a feature map among other things. Convolution layer-based filtering is done to the input. As a result of using the filtering procedure repeatedly, a feature map that displays the specific properties associated with the data points is produced. Convolution is a linear process where the input multiplication is controlled by a set of weights. The weight often referred to as kernels, of the 1D array is multiplied by the inputs. This process creates a feature map by producing many values, each with a unique value for each iteration.

In this study, a hybrid CNNSVM/RF/DT model has been proposed to predict credit risks. The important characteristics of both classifiers are combined in the suggested hybrid model. CNN functions automatically as a feature extractor, whereas ML algorithms function as a classifier [35]. Due to this, a CNN is used by both ML algorithms and fully connected networks because of their automatic feature extraction capabilities. Therefore, no additional feature extraction or selection stages are required prior to the SVM/RF/DT-based classification [29].

The proposed approach considers four classifiers, as shown in Figure 4. A fully connected layer with soft-max that was trained using an end-to-end scenario makes up the first classifier. Soft-max helps to convert a vector of numbers into a vector of probabilities, where the probability of each value is proportional to the relative scale of each value in the vector. For an input vectors and output a vector of probability either defaulter or nondefaulter classes through a soft-max function at the end of the model architecture is given bywhere is the index and is the number of outputs.

Based on the characteristics of the CNN and ML methods such as SVM, RF, and DT, we suggest two stages to construct a credit-risk prediction. In the first stage, features from the original credit customer’s data are extracted using the CNN’s impressive feature extraction characteristics to create a new feature matrix. Second, SVM/RF/DT algorithms are utilized to build the prediction model utilizing the updated feature matrix as input data.

The convolutional layer separates the historical credit customer data from the various borrowers attributes, and the pooling layers minimize the feature maps’ size. The pooling layer thereby reduces both the number of parameters that must be calculated from the historical credit data and the amount of computation required within the network. After the historical credit data are flattened into a 1D array, and the feature maps formed by iteratively applying multiple convolutions and pooling layers serve as the inputs for the SVM, RF, and DT. Hence, in the architecture of our suggested models, SVM/RF/DT takes the position of the fully connected and soft-max layer of a CNN [16, 3539].

3.7. Performance Evaluation Metrics

The accuracy, precision, sensitivity, and specificity numbers are all included in a classification analysis along with the F1 score [40]. The accuracy of a model is one way to measure the number of correctly predicted data points out of all the data points [28, 41].

Precision is the proportion of accurately predicted positive outcomes out of all predicted positive outcomes. It can be expressed as the proportion of true positives (TP) to the total of true and false positives (TP + FP) [42].

Recall also called sensitivity is a proportion of accurately predicted positive outcomes among all actual positive outcomes. The ratio of true positives (TP) to the sum of true positives and false negatives (TP + FN), which helps to identify the proportion of correctly predicted actual positives.

The F1 score, a ML evaluation metric, rates a model’s accuracy. It incorporates a model’s recall and precision values. The accuracy statistic shows how often a model predicts accurately over the entire dataset [43].

4. Results and Discussion

Different values of the parameters were taken into consideration and fine-tuned throughout testing in order to choose appropriate parameters for both a CNN fully connected layer and SVM, RF, and DT model.

We run four experiments. In the first experiment, a fully connected CNN served as a feature extractor and a fully connected network with soft-max served as a classifier. We used the Adam optimizer, ReLU activation function, and max-pooling during training, and after 25 epochs, the network converges to the desired results. A SVM, RF, and DT classifier were used in the other three tests. The input feature created by applying several convolutional and pooling layers to the credit data is flattened into a 1D and utilized as an input for the SVM, RF, and DT. We used a radial basis function kernel with and gamma = 0.8 to train a hybrid CNN–SVM. We applied the n-estimators = 100, max-depth 5, and Gini criterion to train the hybrid CNN–RF and CNN–DT models.

The accuracy, precision, and recall of the suggested model, which is calculated as the proportion of correctly identified defaulter and nondefaulter credit applicants, is used to assess its performance. According to the results shown below in Table 2, our proposed hybrid CNN-ML algorithms exceed the fully connected network in its ability to classify credit applicants as defaulter or nondefaulter with an overall accuracy of 98.60%, 95.50%, and 96.90% when employing SVM, RF, and DT, respectively. Based on the precision that is the potential of accurately predicted positive outcomes out of all predicted positive outcomes CNN–RF exceeds the CNN–SVM and CNN.

Figures 5(a) and 5(b), Figures 6(a) and 6(b), Figures 7(a)and 7(b), and Figures 8(a) and 8(b) display the CNN model’s performance and classification error, as well as the hybrid CNN–SVM, CNN–RF, and CNN–DT models. These models were evaluated using an 80 : 20 learning method with varied parameter settings for each algorithm.

The CNN architecture is combined with a SVM in the hybrid model, which uses a radial basis function kernel. The SVM parameters include a regularization parameter (C) set at 100.

The CNN architecture is integrated with the RF and DT algorithms in the hybrid model. The RF and DT are configured with the entropy criterion for splitting, a maximum depth of 5 and 100 estimators.

Table 3 shows the results of a credit-risk prediction experiment employing a hybrid CNN–SVM model. The analysis included a variety of kernels, including linear, polynomial, and radial basis functions (RBF). The experimental results show that both RBF and linear kernels are effective at predicting credit risk. Notably, the hybrid CNN–SVM obtained 96.5% accuracy. This result was obtained by using the radial basis function with specified parameters, specifically a regularization parameter (C) of 70, a gamma value of 0.7, and an 80 : 20 learning scheme.

The performance and classification error of a hybrid CNN–SVM with an 80 : 20 learning scheme, RBF, linear, and polynomial kernels, regularization parameter , and gamma = 0.7 are shown in Figure 6.

Table 4 provides insights into the performance evaluation of the hybrid CNN–RF prediction model using various maximum depths and criteria. The reported accuracy of the hybrid CNN–RF model is 92.17%, achieved with a learning scheme of 80 : 20, 80 estimators, the Gini criterion, and a maximum depth of 5.

The data show that increasing the maximum depth from 4 to 5 and changing the criterion from entropy to Gini improves all performance measures, resulting in improved outcomes.

The increased maximum depth enables the RF component’s DTs to grow deeper, capturing more complicated relationships and potentially boosting the model’s capacity to discern credit occurrences. The model can capture finer patterns and make more sophisticated predictions with a deeper tree structure.

Changing the criterion from entropy to Gini also improves performance. Both entropy and Gini are impurity metrics used to assess the quality of splits in a RF. The model emphasizes impurity minimization based on Gini impurity, which may correlate better with the characteristics of the credit prediction task when employing the Gini criterion. This criterion update enables the model to generate more relevant splits while improving overall accuracy.

According to the stated accuracy of 92.17%, the hybrid CNN–RF model works well in credit prediction. The model performs considerably better when the maximum depth is increased from 4 to 5 and the Gini criterion is used. These findings emphasize the significance of parameter adjustment and the impact it might have on the hybrid CNN-RF model’s predictive capabilities.

The performance of a hybrid CNN–DT with an 80 : 20 learning scheme, 80 estimators, and various criteria is shown in Table 5 below. Accuracy, precision, recall, F1-score, and AUC performance indicators for the hybrid CNN–DT improve when the Gini criterion is used, and the maximum depth is increased from 4 to 5. As a result, increasing the depth enhances the performance of the model.

Tables 6 and 7 illustrate the performance and classification error of hybrid CNN-RF and CNN-DT with a learning scheme of 80 : 20, the Gini and entropy criterion, max-depth = 5(4), and n-estimators = 80. When we use the Gini criterion and increase the maximum depth from 4 to 5, the classification error is reduced. As a result, applying the Gini criterion and increasing the depth help the model perform better.

Increasing the DT’s maximum depth enables for more complicated and comprehensive decision limits. The model can capture more information and customize the categorization process to the individual properties of the credit data using a deeper tree. This additional complexity allows the model to make finer differences and potentially improve its effectiveness in credit instance classification.

5. Conclusions

Despite the rise in credit needs and the level of competition in the banking sector, the majority of banks has been reluctant to use machine and deep learning algorithms to minimize credit risk. Therefore, to overcome this problem, in this study, we developed a hybrid CNN—SVM/RF/DT model to predict credit risk. Four classification approaches were examined in order to build the model. The first classifier is a fully connected layer with soft-max that is trained using an end-to-end process, whereas the other three classifiers are binary SVM/RF/DT classifiers which are piled on top by removing the final fully connected and soft-max layer.

The pooling layer consequently reduces the amount of computation carried out within the network as well as the number of parameters that must be derived from the credit data. The feature maps that are produced by following functional a number of convolution and pooling layers to the credit data are flattened into a 1D array and used as inputs for the SVM, RF, and DT. Different values of the parameters were taken into consideration and fine-tuned throughout training and testing the model in order to choose appropriate parameters. In accordance with the experimental findings, a fully connected CNN and a hybrid CNN with SVM, DT, and RF, respectively, achieved a prediction performance of 86.70%, 98.60%, 96.90%, and 95.50%. According to the results, our proposed hybrid method exceeds the fully connected CNN in its ability to predict credit applicants.

Data Availability

Data supporting this research article are available on request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.