Abstract
Credit card fraud is a common occurrence in today’s society because the majority of us use credit cards as a form of payment more frequently. This is the outcome of developments in technology and an increase in online transactions, which have given rise to frauds that have caused significant financial losses. In order to detect fraud in credit card transactions, efficient and effective approaches are needed. In this study, we developed a hybrid CNN-SVM model for detecting fraud in credit card transactions. The effectiveness of our suggested hybrid CNN-SVM model for detecting fraud in credit card transactions was tested using real-world public credit card transaction data. The architecture of our hybrid CNN-SVM model was developed by replacing the final output layer of the CNN model with an SVM classifier. The first classifier is a fully connected layer with softmax that is trained using an end-to-end approach, whereas the second classifier is a support vector machine that is piled on top by deleting the final fully connected and softmax layer. According to experimental results, our hybrid CNN-SVM model produced classification performances with accuracy, precision, recall, F1-score, and AUC of 91.08%, 90.50%, 90.34%, 90.41, and 91.05%, respectively.
1. Introduction
A credit card is a tiny, thin plastic or fiber card with personal information such as a picture or signature that allows the person whose name is on it to charge goods and services to the connected account, with regular debits being made from that account [1]. It is a financial product provided by banks with a predetermined credit limit that enables you to conduct purchases without using cash. Your credit score, credit history, and income are used by the card issuers to establish your credit limit. Customers who borrow money using credit cards are required to pay it back in full by the billing date or over time, together with any relevant interest and other costs.
In recent years, as technology has advanced, the majority of them are purchasing their requirements using a credit card [2, 3]. Services such as e-commerce, tap-and-pay systems, and online bill payment systems have proliferated and become more widely used [2, 4].
Despite the many advantages that credit cards offer users, they are also linked to issues such as fraud and security [5]. For any bank in the globe, the security of card payments and the confidence of customers using cards to make purchases are serious concerns [6]. According to a number of reports, attempts to detect credit card fraud have increased over time [7, 8]. This is why credit card fraud is reportedly a problem for banks and other financial institutions.
Banks and other financial organizations use machine learning mostly to improve their ability to detect fraudulent transactions. However, there are a variety of reasons why machine learning may have trouble detecting fraud [9–11]: The distribution of the data is particularly imbalanced due to the small number of fraudulent transactions, the frequent altering of the data over time, and the lack of real-world datasets due to privacy issues.
Numerous approaches were proposed in the literature in an effort to overcome these issues, but no hybrid convolutional neural network-support vector machine has been found based on the authors’ ability to access the techniques used in credit card fraud detection.
This study integrates deep learning with machine learning algorithms to predict the detection of credit card fraud in accordance with the body of existing literature. We proposed CNN hybridized with the support vector machine due to the nature of the support vector machine effectiveness on datasets with multiple features, effective in cases where the number of features is greater than the number of data points; it uses a subset of training points in the decision function called support vectors which makes it memory efficient; different kernel functions can be specified for the decision function, in order to predict a credit card fraud detection.
2. Related Works
More substantial study attempts have been made in the literature due to the potential economic value of recognizing and classifying credit card fraud detection. This section reviews a number of significant studies.
Fraudulent credit card transactions can cause significant financial losses, especially when they have a high value. In order to prevent fraudulent transactions from being approved by card issuers, it is essential to spot them. The majority of conventional techniques for detecting fraud are built on machine learning models. Numerous studies looked into how deep learning models could be used to accurately spot fraudulent transactions. Yet these investigations only take into account one deep learning model. Reference [12] shows a variety of deep learning and ensembles algorithms for identifying fraudulent credit card transactions. Their work’s primary aim is to reduce missed frauds and false alarms, and they especially contribute to contribute the work by merging the results of three different deep learning models, namely, convolutional neural networks, autoencoders, and recurrent neural networks. They employed the Adam optimizer with the best parameters during the CNN modeling procedure. The used parameters are ReLU and sigmoid activation function, validation-split = 0.2, dropout = 0.2, learning rate = 0.001, and epochs = 50. The second model of the autoencoder has 100 training epochs, 256 batch sizes, and a 0.01 learning rate as its inputs and parameters. The third model, an RNN model, includes inputs and settings that contribute to improved results, such as 100 training iterations, a 20000 batch size, a 0.01 learning rate, and ReLU and sigmoid activation functions. Studies on a publicly available credit card dataset showed that autoencoders have the best validation accuracy (93.4%) when compared to CNN (91.4%) and RNN (91.8%) for single deep learning-based models. The validation accuracy (94.9%) for the ensemble findings outperformed all three applied deep learning models.
Both the owners of credit cards and financial institutions suffer large financial losses as a result of credit card theft. Reference [13] developed a credit fraud detection model using state-of-the-art of machine learning and deep learning algorithms. Their primary goal is to identify state-of-the-art deep learning algorithms frauds, which includes the availability of public data, data with large class imbalances, changes in the form of fraud, and a high rate of false alarms. They considered several machine learning-based strategies for credit cards including the extreme learning method, decision tree, random forest, support vector machine, logistic regression, XG Boost, and modern deep learning algorithms. However, due to the low accuracy,they apply state of the art deep learning algorithms to reduce fraudlosses. The recent advancement of deep learning algorithms has been the main area of their focus. The dataset was first subjected to a machine learning technique, which somewhat increased the accuracy of fraud detection. Subsequently, three convolutional neural network-based designs are used to boost the effectiveness of fraud detection. The model they suggest comprises 14 layers, starting with a convolutional layer with a kernel size of 32 × 2 and a ReLU activation function, then a batch normalization layer, and finally a dropout layer with a dropout rate of 0.2. Then, a batch normalization layer, a dropout layer, and a convolutional layer with a kernel size of 64 × 2 and a ReLU activation function are added. After that, they add a flattening layer with a kernel size of 64 × 2 and a ReLU activation function and then three dense layers, a dense layer, and a dropout layer with a dropout rate of 0.5. By adding more layers, they improved the precision of detection, and by varying the number of hidden layers, epochs, and applying the newest models, a thorough empirical investigation has been conducted. The evaluation of their research effort demonstrates the enhanced outcomes obtained, with optimum values for accuracy, F1-score, precision, and AUC curves of 99.9%, 85.71%, 93%, and 98%, respectively. For problems involving credit card detection, their suggested model performs better than the state-of-the-art machine learning and deep learning algorithms.
Taha and Malebary [14] provide an intelligent method for identifying fraud in credit card transactions utilizing an improved light gradient enhancing device (OLightGBM). The suggested method carefully mixes a Bayesian-based hyperparameter optimization method to change a light gradient boosting machines parameters (LightGBM). The 5-fold cross-validation method is utilized to test the effectiveness of the proposed method for identifying credit card fraud using two real-world data sets. The Bayesian-based hyperparameter optimization algorithm is used to train the parameters for the suggested approach. Many metrics are taken into account, including precision, recall, accuracy, AUC, and F1-score, to evaluate the performance of the suggested technique. The suggested approach performed better than other methods when used with the two data sets, according to their experimental results, in terms of accuracy (98.40%), the area under the receiver operating characteristic curve (AUC) (92.88%), precision (97.34%), and F1-score (56.95%).
Some other authors trained four prediction models, including the random forest (RF), the gradient boosting machine, and the artificial neural network (ANN) [15]. The class imbalance issue in the data was a significant difficulty for them when they developed their fraud detection systems because actual transactions exceed fraudulent ones and frequently make up less than 1% of all transactions. In order to overcome this challenge, they used the synthetic minority oversampling technique (SMOTE), random under-sampling (RUS), density-based synthetic minority over-sampling technique (DBSMOTE), and SMOTE combined with edited nearest neighbor (SMOTEENN) for all models. In their experiment, the ANN architecture was configured with 30 input neurons, 200 hidden neurons in a single layer, and two layers of output neurons. They use ReLU activation function. For each epoch iterating over the training dataset, the model improved to its best performance after 61. The hidden layer’s dropout was set at 40%, causing the neurons there to be removed by default by the model, and 0.005 learning rate was chosen by the model. There were 43 trees in the distributed random forest, with a maximum tree depth of 20. The lack of overfitting is achieved by the minimal tree depth, which reduces model complexity. A minimum of 5 observations must be made for each leaf, according to the min rows option, which has the value 5 specified. The boosting machine contained 116 trees at the maximum depth of 15 trees. Overfitting can be prevented, which will reduce model complexity. Each tree will take up 80% of the columns because the minimum number of rows to sample for each tree was set at 100 and the column sampling rate was set at 0.8. The findings of this experiment indicate that SMOTE-based sampling techniques will perform well. The best recall score, which was 0.81, was achieved by the SMOTE sampling method utilizing the DRF classifier. The precision score for this classifier was observed to be 0.86. The stacked ensemble algorithm, which had the best average performance at 0.78, was trained using every sampled dataset. In the end, they come to the conclusion that the stacked ensemble model has shown promise in the detection of fraudulent transactions across the majority of sampling techniques.
Special attention is also given to develop a model for detecting credit card fraud using a hybrid AdaBoost and majority voting techniques [16]. The empirical evaluation has made use of a number of common models, including NB, SVM, and DL. An openly accessible credit card data set was utilized to evaluate both individual (standard) models and hybrid models that combined the AdaBoost and majority voting approaches. As it considers both true and erroneous positive and negative expected outcomes, the MCC metric has been used as a performance indicator. The best MCC score, obtained through majority voting, is 0.823. They have employed the same individual and hybrid models. AdaBoost and majority voting techniques have been used to attain a flawless MCC score of 1. Noise ranging from 10% to 30% has been introduced to the data samples to help further analyze the hybrid models. For a 30% increase in noise, the majority voting technique produced the best MCC score of 0.942. This demonstrates that the majority voting method performs well even when there is noise.
For feature selection, a special group of authors [4] developed a machine learning-based method for identifying credit card fraud utilizing the genetic algorithm. The suggested detection engine employs the machine learning classifiers decision tree (DT), random forest (RF), logistic regression (LR), artificial neural network (ANN), and Naive Bayes (NB) after selecting the optimal attributes. A dataset compiled by European cardholders is used to assess the performance of the proposed credit card fraud detection engine. The outcomes showed that their suggested method outperformed already-existing ones. The GA-ANN, which had a 100% accuracy and an AUC of 0.94, served as a backup for the GA-DT, which attained a 100% accuracy and an AUC of 1. Their findings demonstrated that GA-RF attained the highest possible degree of general accuracy.
According to the potential of the authors, there is no work in detecting credit card fraud using a hybrid CNN-SVM algorithm. Therefore, in this study, we develop a credit card fraud detection model based on a hybrid CNN-SVM.
3. Methods
The CNN and the SVM classifiers were integrated into our suggested model design. In Section 3.1, we discuss about the nature of the dataset, in Section 3.2, we give a quick overview of the 1D-CNN theory, and in Section 3.3, we discuss the SVM structure. The hybrid CNN-SVM trainable feature extractor model will then be introduced in Section 3.4, and finally the model performance evaluation metrics are discussed in Section 3.5.
3.1. Dataset
We employ a secondary dataset made up of transaction information from European credit card users (https://www.kaggle.com) for the purposes of this study. This dataset has a total of 284,809 transactions. In columns V1–V28, time and amount are among the 30 features in the dataset. Personal information and other elements that may include sensitive data are covered by the obfuscated columns V1–V28. The target variable, which consists of two classes, is shown in the final column. A fraudulent transaction represents the first class and has a value of 1, while a nonfraudulent transaction has a value of 0. Only 0.2432 percent of the transactions in the dataset were fraudulent;hence, models can only reliably predict legitimate transactions in the future while completely failing to do so when attempting to predict fraudulent transactions. Thus, when training the models, it is crucial to maintain a balance between the classes in the dataset. So, to overcome this gap, it is mandatory to use sampling technique.
Resampling the data is one of the approaches that are most frequently used to rectify an unbalanced dataset. Undersampling and oversampling are the two main sorts of ways for doing this. Oversampling approaches are typically favored over undersampling ones. Because of this, when we undersample data, we frequently leave out cases that could contain crucial information. In this study, we use synthetic minority oversampling technique (SMOTE).
The SMOTE selects a point at random from the minority class and calculates its K-nearest neighbors for this point [17, 18]. Between the selected point and its neighbors, the artificial points are inserted. The following easy steps make up the SVMOTE oversampling process [19, 20]:(i)Performs the following for each pattern in the minority class, that is, from the fraudulent class of credit card transactions:(1)Choose any of its K-nearest neighbors (belonging to the fraudulent class also)(2)Create a new pattern as shown below on the line segment joining the pattern and the selected neighbor: where is a uniform random variable in the range [0, 1]
3.2. Convolutional Neural Network
A convolutional neural network (CNN) is a unique type of feedforward neural network that makes use of pooling, convolution, and ReLU layers [21, 22]. A CNN is a multilayer artificial neural network with a deep supervised learning architecture that primarily contains four layers. The components of 1D-CNN are convolution layers, pooling layers, dropout layers, and activation functions for processing the one-dimensional data.
3.2.1. Convolutional Layer
A convolutional layer is an essential part in building a convolutional neural network [21]. It is fundamental for applying a filter to an input. A feature map, which shows the specific properties connected to the data points, is produced by using the filtering procedure repeatedly. In a linear procedure known as convolution, a set of weight is used to contain the multiplication of inputs. The single-dimensional array weights, referred to as the kernel, are multiplied by the inputs in this instance. This procedure produces a different value for each pass, and when it is carried out, it produces a feature map, which is a collection of values.
3.2.2. Pooling Layer
Once a feature has been identified, its exact placement is less important [23]. Consequently, the pooling or subsampling layer comes after the convolution layer [24]. Pooling techniques can be thought of as down-sampling operations that strive to reduce the number of parameters while maintaining the most important qualities in order to speed up the subsequent computing phase. The pooling step will also address the overfitting problem. Even though CNNs can use a variety of pooling approaches, max-pooling is the most used one. Utilizing the pooling strategy has the dual benefits of significantly reducing the number of trainable parameters and introducing translation invariance [25]. A window is chosen, and the input items included in that window are sent through a pooling function to carry out a pooling operation [26].
Input is split into 1D pooling regions, as shown in Figure 1, and a 1D max-pooling layer downsamples by finding the maximum of each region.

3.2.3. Dropout
Overfitting, which is loosely defined as the problem of memorizing the inputs instead of learning their general characteristics, is avoided by using dropouts. It makes use of randomness during the training process. The weights are for the entire problem rather than being tuned for data noise. If not used correctly, it could lead to slow training or a failure to recognize trends.
3.2.4. Activation Function
ReLU is a linear activation function that outputs the same input and makes it zero in the absence of a negative input [27]. The ReLU activation function solves the vanishing gradient problem, improves the model performance, and speeds up learning from the training data.
3.3. Support Vector Machine
Support vector machine (SVM) is one of the relatively recent and promising techniques for learning separate functions in pattern recognition (classification) tasks or for performing function estimates in regression issues [28]. Instead of offering a regression model and an algorithm, support vector machines offer a classification learning model and an algorithm [29]. The goal of employing SVM is to identify a classification criterion (i.e., a decision function) that, at the testing stage, can accurately classify unknown data with good generalization [30]. A training set is said to be linearly separable if a linear discriminant function exists with a sign that corresponds to the class of each training example. If a training set can be linearly separated, there are typically an infinite number of separating hyperplanes, choosing a separation hyperplane that maximizes the margin or one that leaves the most distance between it and the nearest example [31].
We consider a set of data points made up of vectors , each of which is linked to a value that indicates whether the element belongs to the fraud class (+1) or not-fraud class (−1).
A linear hyperplane for a set of training data, , for , is defined aswhere is an n-dimensional vector and is a bias term.
The optimal hyperplane is required to satisfy the following constrained minimization as shown [32]:
In these cases, the constraint can be enforced using a Lagrange multiplier , shown as follows:
To find the minimum of (3) over (while fixing all ), we set the gradient vector to zero as follows:
The cost function’s solution, which yields the maximum hyperplane utilized to categorize the two classes of the credit card transactions as nonfraudulent and fraudulent, is as follows:
In the absence of linear separability in the training set, optimal hyperplanes are ineffective in classifying credit card fraud detection. So, to counter this problem, a soft-margin is introduced [33]. Slack variables are introduced to permit the disregard of some constraints. To put it another way, the margin will offer some training points. Therefore, we want their margin penetration to be as slight as it is practical. The margin should only be made up of the fewest feasible points.where is a slack variable and is the penalty parameter of the error term.
The solution of (6) is
Higher values of produce lower biases and higher variances, while lower values of provide opposite effects as follows: higher biases and higher variances. It is necessary to determine the ideal C value for the trade-off between bias and variance.
3.4. Hybrid CNN-SVM
Convolutional neural networks are employed by both support vector machines and fully connected networks because of their capacity to automatically extract features. The main characteristics of both classifiers are combined in the suggested hybrid model. Support vector machines function as classifiers [34], while the hierarchical structure of CNN, a successful deep-learning model, allows for the extraction and training of high-quality features at each layer [34, 35]. SVMs are capable of outperforming convolutional neural network in terms of classification performance [36]. As a result, before the support vector machine-based classification, there is no requirement for an additional feature extraction or selection stage [37].
Our hybrid CNN-SVM model’s architecture was created by substituting an SVM classifier for the last output layer of the CNN model [38, 39]. The characteristics obtained from the CNN layers are sent into the SVM classifiers after being reorganized and going through the first fully connected layer. As a result, support vector machines were utilized after fully connected networks with a softmax function.
The proposed method architecture has two classifiers, as shown in Figure 2. The initial classifier, trained using an end-to-end case, consists of a fully connected layer with softmax. After the final fully connected and softmax layer has been removed, a binary SVM classifier is added on top of it. The credit card fraud data are separated from the attributes of the transaction information by the convolutional layer, and the pooling layers reduce the size of the feature maps. As a result, the pooling layer lowers the amount of computation done within the network and the number of parameters that need to be generated from the credit card fraud data. The lending data’s feature maps, which were created by repeatedly applying a number of convolutional and pooling layers, are flattened into a one-dimensional array and used as inputs for support vector machine algorithms.

The architecture consists of three convolutional layers (Conv1D) with 256, 128, and 64 filters each, a two-kernel size, a ReLU activation function, a dropout layer, pooling layers, and three dense layers. In order to prevent the model from overfitting, we implemented a dropout with a 0.5 rate that disables 50% of the neurons during training after three successive convolution layers. The max-pooling layer lowers the computational cost of the model by reducing the number of parameters that must be learned. A flattening layer is then applied to the results, reducing the three-dimensional vector to a single dimension. Next, three dense layers with ReLU and SoftMax activation functions were used to predict a target variable.
3.5. Model Performance Evaluation Techniques
Depending on the aim of the experiment we are doing, we can use a variety of statistical metrics to evaluate binary classifications [40]. The most important measures for binary classification issues are accuracy and F1-score. The accuracy of a model is one way to measure the number of correctly predicted data points out of all the data points [41–44]. Accuracy is effective when the target class is well-balanced, but it is not a suitable option for courses that are out of balance.
Four types of outcomes are possible [45] while evaluating a credit card fraud detection and prediction model performance.(i)True positives (TP): In some cases, the model’s predictions come true. This is the case where a customer is a nonfraudulent and the model also predicts in the same way.(ii)True negatives (TN): It is when the consumer does not have nonfraudulent and the model does not predict it; in other words, when the model says nothing has happened and nothing has really occurred.(iii)False positives (FP): In situations when the algorithm claims that something has happened when it has not, the model predicts fraud even while the customer does not have nonfraudulent data. Errors of this kind are classified as Type I errors.(iv)False negatives (FN): There are instances where the model does not give any guarantees since the event has already occurred, such as when a customer is the target of fraud but the model did not foresee it. This is a very serious error and it is called a Type II error.
Precision is the proportion of accurately predicted positive outcomes out of all predicted positive outcomes. It can be expressed as the proportion of true positives (TP) to the total of true and false positives . Precision is helpful when false positives are more problematic than false negatives.
Recall, also called as sensitivity, is a proportion of accurately predicted positive outcomes among all actual positives outcome. The ratio of true positive (TP) to the sum of true positives and false negatives helps to identify the proportion of correctly predicted actual positives.
Recall and precision are combined into one performance indicator called . It is the harmonic mean between precision and sensitivity [46]. The weighted average of precision and recall is the . Therefore, both false positives and false negatives are considered while calculating this score. typically outperforms accuracy, especially when there is an uneven distribution of classes.
The receiver operator characteristic (ROC), also known as AUC-ROC, is a probability curve that contrasts the true positive rate (TPR) and the false positive rate (FPR) at different threshold values and distinguishes the “credit card” from the “fraudulent.” The AUC measures a classifier’s ability to distinguish between fraudulent and nonfraudulent classes. The ability of the classifiers to distinguish between the target classes improves with increasing AUC values [47].
4. Results and Discussion
We carry out two experiments. A convolutional neural network served as a feature extractor in the first experiment while a fully connected network with softmax served as a classifier. The Adam optimizer, ReLU activation function, max-pooling, and dropout = 0.5 are used during 15 training epochs before the network produces the desired results. We utilize synthetic minority oversampling technique (SMOTE) due to the highly unbalanced structure of the dataset. The second test used a support vector machine classifier. The output of the late CNN layers from the first experimental setup is used as a feature in this experiment to train a second support vector machine.
The support vector machine with C = 100, gamma = 0.9, and radial basis function kernel stacked together at the top of flatten layer achieved an accuracy of 91.08%. We can therefore conclude that our model is quite effective in predicting the detection of fraud. However, this is incorrect due to the nature of the dataset which is imbalanced. When evaluating predictive performance in the context of an unbalanced dataset, accuracy is a poor criterion to use. We must thus consider additional performance criteria that offer a better direction for model evaluation.
Therefore, precision, recall, F1-score, and AUC are helpful measures to assess how well the fraud detection model is working.
Table 1 shows the prediction performance of our proposed method. The precision score, which measures how effectively fraudulent transactions are distinguished from nonfraudulent transactions, gave the proposed method a score of 90.50% as well. Our hybrid model achieved a recall of 90.34%, which is the proportion of positive samples that were correctly identified as positive in comparison to all positive samples. As more positive samples are discovered, recall, a measure of how well the model can identify positive samples, increases. The F1-score, which goes from 0 to 1 with 1 being the best, can be used to evaluate the overall performance of a model. More specifically, the F1-score may be considered as the model’s capacity to balance its capacity to gather positive cases (recall) and be accurate with the cases it does capture (precision), and our model achieved 90.41%. The F1-score is a representation of the equilibrium between recall and precision. As a result, both false positives and false negatives are taken into account in this score. The AUC values it attained, which were 91.05%, show that the suggested approach is capable of telling valid credit card transactions apart from fraudulent ones.
The prediction capability of our proposed method based on various kernels is displayed in Table 2. Radial basis function (RBF), sigmoid, polynomial, and linear are the kernels. So, based on the findings of the experiments, polynomial and sigmoid kernels perform poorly when it comes to predicting credit card fraud. The maximum accuracy that we have achieved with radial basis function and a linear kernel is 91.08% and 90.62%, respectively, with and gamma = 0.9. Therefore, we conclude that when we are using a radial basis function and linear kernel, our proposed technique performs admirably in terms of predicting credit card fraud.
The hybrid CNN-SVM model’s performance, as well as a comparison of the hybrid CNN-SVM model’s performance to the CNN model, is shown in Figures 3(a) and 3(b).

(a)

(b)
According to the experimental results, our proposed CNN-SVM algorithms exceeds the fully connected convolutional neural network in its ability to classify fraud detection as fraudulent or nonfraudulent with precision, recall, F1-score, and the area under the curve (AUC) of 90.50%, 90.34%, 90.41%, and 91.05%, respectively.
For better visualization of the performance of the proposed method, AUC-ROC curves-performance indicators for classification problems with different threshold settings and a precision-recall curve with a number of threshold values are summarized as a single score, as shown in Figures 4(a) and 4(b).

(a)

(b)
5. Conclusions
Credit card fraud is a common occurrence in the modern world because the majority of us use credit cards as a form of payment regularly. This is the result of technological advancements and a growth in online transactions, which have led to frauds that have resulted in significant losses. To find fraud in credit card transactions, efficient and effective methods are needed.
In this study, we create a hybrid CNN-SVM model to identify fraudulent credit card transactions. We assessed the effectiveness of our proposed hybrid CNN-SVM model for detecting credit card fraud using real-world, openly accessible credit card transaction data.
The dataset from https//:www.kaggle.com has 284,809 transactions in total. The dataset has V1, …, V28, time, and amount among its 30 features. Personal data and other elements that can include sensitive data are covered in the obfuscated columns V1–V28. The target variable, which consists of fraudulent and nonfraudulent transactions, is quite imbalanced. As a result, models can only correctly predict future valid transaction; they are completely incapable of doing so when predicting future fraudulent transactions. So, when training the models, it is crucial to maintain a balance between the classes in the dataset. Hence, we applied the synthetic minority oversampling technique (SMOTE) to overcome this challenge. The SMOTE randomly chooses a point from the minority class and determines this point’s K-nearest neighbors. The artificial points are placed between the chosen point and its neighbor.
Our hybrid CNN-SVM model was designed by replacing the final output layer of the CNN model with an SVM classifier. The first classifier employs a fully connected layer with softmax and is trained end-to-end. The second classifier, a support vector machine, can be stacked on top of the first classifier by removing the last fully connected and softmax layer. The credit card fraud data are separated from the attributes of the transaction information by the convolutional layer, and the pooling layers reduce the size of the feature maps. As a result, the pooling layer lowers the amount of computation done within the network and the number of parameters that need to be generated from the credit card fraud data. The lending data’s feature maps, which were created by repeatedly applying a number of convolutional and pooling layers, are flattened into a one-dimensional array and used as inputs for support vector machine algorithms.
The experimental results show that our proposed method outperforms the fully connected convolutional neural network in terms of precision, recall, F1-score, and AUC in predicting whether credit card fraud detection is fraudulent or not.
In general, we draw the conclusion that utilizing a hybridized CNN with SVM to predict the detection of credit card fraud is preferable to employing a fully connected convolutional neural network with softmax.
In our future work, we will develop a spatio-temporal model with hybrid machine learning algorithms to detect credit card fraud and compare various hybrid machine learning techniques.
Data Availability
The data are obtained from open source.
Conflicts of Interest
The authors declare that they have no conflicts of interest.