Abstract

Cybersecurity in information technology (IT) infrastructures is one of the most significant and complex issues of the digital era. Increases in network size and associated data have directly affected technological breakthroughs in the Internet and communication areas. Malware attacks are becoming increasingly sophisticated and hazardous as technology advances, making it difficult to detect an incursion. Detecting and mitigating these threats is a significant issue for standard analytic methods. Furthermore, the attackers use complex processes to remain undetected for an extended period. The changing nature and many cyberattacks require a quick, adaptable, and scalable defense system. For the most part, traditional machine learning-based intrusion detection relies on only one algorithm to identify intrusions, which has a low detection rate and cannot handle large amounts of data. To enhance the performance of intrusion detection systems, a new deep multilayer classification approach is developed. This approach comprises five modules: preprocessing, autoencoding, database, classification, and feedback. The classification module uses an autoencoder to decrease the number of dimensions in a reconstruction feature. Our method was tested against a benchmark dataset, NSL-KDD. Compared to other state-of-the-art intrusion detection systems, our methodology has a 96.7% accuracy.

1. Introduction

Internet-enabled services have grown exponentially in recent years. According to current estimates, more than 60 billion Internet-connected gadgets will be available by 2023 [1]. Despite this, computer networks are continually at risk of attack from threat hackers via the Internet. The concept of intrusion detection system (IDS) was first proposed by [2]. Since then, a number of IDS products have been developed and refined to meet the needs of network security. However, because of the rapid advancement of technology over the previous decade, the size of networks and the number of applications handled by network nodes have been increased significantly. As a result, a massive amount of critical data is being generated and shared across various network nodes. These data and network nodes' security have grown increasingly difficult due to many threats generated either through the mutation of an existing assault or through the development of a special attack. Security concerns can affect almost every node in a network [3]. For example, the data node may be highly crucial for a company. The company's reputation and financial losses could be severely impacted if the node's information is compromised. Ineffectiveness in detecting various attacks, including zero-day attacks, and minimizing false alarm rates has been demonstrated by existing IDSs (FAR). As a result, there is a growing demand for a network intrusion detection system that is efficient, accurate, and cost-effective to ensure robust network security [4]. Figure 1 shows the cyberattacks on the MacAfee network in 2021.

With the help of firewalls and IDSs, various security threats can be effectively countered in a single system. Misuse and anomaly detection schemes are the two basic types of IDS schemes that can be implemented using various machine learning approaches. Detection systems rely primarily on the signatures of security threats and malicious activity to allow multiclass classification and multilevel detection. The IDS, on the other hand, is unable to identify new assaults in which its signature does not exist. Therefore, these systems benefit from being better able to detect known harmful behavior and its variations. As an alternative, anomaly detection-based IDS techniques rely on the usual behavior of users to detect new threats and only support binary classifications [5]. It is important to keep user profiles up-to-date in dynamic companies where roles occasionally shift [6]. As a result, some anomaly detection techniques may have an issue with false positives. Machine learning techniques are being used in various scenarios, including anomaly detection and misuse detection [7]. Because of the absence of labelled training datasets and the heavy reliance on retrieved features extracted by humans, conventional machine learning approaches cannot be deployed on big platforms [8]. In machine learning, deep learning is a new paradigm that uses artificial neural networks (ANNs) and has a better performance than existing methods.

Researchers have developed several ML and DL-based methods to improve NIDSs' ability to detect malicious assaults over the past decade. Although network traffic has risen, NIDSs' ability to identify malicious intrusions has been restricted by the increased number of security threats that have resulted. To better detect network intrusions, researchers are just beginning to look into the potential of applying deep learning (DL) algorithms in NIDSs. Traditional security methods cannot be directly applied to IoT devices because of their limited computational and basic resources. Rule-based detection approaches, on the other hand, were found to be effective [9] As a result, anomaly-based detection procedures are essential as IoT surroundings and technology keep growing.

Deep neural networks (DNNs), including convolutional neural networks (CNN) [10], deep reinforcement learning (DRL) [11], and hybrid DNN structures (HDNN) [1219], are being studied for their intrusion detection capabilities. Shallow neural networks (SNNs) are a subset of ANNs and the primary focus of deep learning research. Distinct from the more traditional SNNs with a hierarchy of networks, DNN can simulate more complex models because of its better modeling and abstract representation capabilities. As a result, DNNs have a great deal of potential for creating helpful techniques by making use of excellent data representation.

1.1. Problem Statement

A single algorithm is commonly used in traditional ML-based intrusion detection, with low detection rates, rigid techniques, and high-dimensional data. When designing an intrusion detection framework for the modern Internet, it is important to keep in mind that it must react quickly and easily to the constantly changing environment. A wide-ranging intrusion detection framework is presented in this article, which can enhance the effectiveness of IDSs in many different ways. Traditional supervised machine learning techniques can benefit from DNN's ability to produce more accurate data representations. However, the time complexity of some approaches, which rely on deep learning techniques, limits their effectiveness.

The autoencoder (AE) model has inspired us to perform experiments using the AE model in real-world IDS applications. First, high-dimensional redundant features are converted into a hyperspace representation linked to input data to lessen the training complexity and impact of high-dimensional redundant features. We used AE and a deep multilayer classifier to improve the classifications task.

The following is a list of the important contributions of this work:(i)Innovation in IDSs based on data analytics and deep multilayer classification techniques is being developed;(ii)Designing and development of an IDS capable of efficiently distinguishing between distinct cyberattack classes in the NLS-KDD dataset with high accuracy;(iii)Development of an IDS with significant industrial application potential.

The rest of the article is structured as follows: Section 2 briefly discusses some of the essential related works. A detailed presentation of the preliminaries is discussed in Section 3. Section 4 presents the proposed deep multilayer-based approach and autoencoders. Section 5 describes the features of the NSL-KDD dataset and algorithm. Results and discussion are presented in Section 5. Finally, Section 6 provides the conclusion and future scope.

2. Literature Survey

The KDD99 and NSL-KDD datasets have been used in the literature to assess various IDSs. Assault classes in the NSL-KDD dataset were discovered using a three-layer MLP created by Yong et al. [20]. The system's accuracy was 79.9% for multilayer classification and 81.2% for binary classification on the test set. Chawla et al. [21] found a binary classification accuracy of 75.49% utilizing self-organizing maps while testing their method on the NSL-KDD dataset (SOMs). Sadiq et al. [22] used MLP and other classical learning methods to get a binary classification accuracy of 95.7%. There was k = 10 folds in the dataset, but this was done by the authors. Ishaque et al.’s [23] semisupervised learning approach is based on fuzzy and ensemble learning theories. An accuracy rating of 84% was achieved on the KDD test set using the NSL-KDD dataset. Deep belief networks (DBNs) for multilayer classification were created by Mighan et al. [24] using a restricted Boltzmann machine (RBM) architecture with a Softmax output layer. It was determined that the proposed approach was quite accurate, with only a false alarm rate of 2.47%, even though just 10% of the KDD99 test samples were employed. SDN was used to create a DNN for the purpose of anomaly detection in [25]. Training a neural network with three hidden layers was made possible thanks to the NSL-KDD dataset. Only six criteria and a two-way discriminating procedure have been utilized, as opposed to the usual (normal vs. abnormal). The results of the experiments were correct 75% of the time. Deep neural networks trained on the KDD99 dataset have been proposed by Liu et al. [26]. A gradient-enhanced machine makes it simpler to detect intrusions (GBM). The GBM parameters were fine-tuned using a grid search. For this investigation, the data from UNSW-NB15, NSL-KDD, and GPRS were all used. When it comes to accuracy and specificity testing, GAR forest, tree-based ensembles, and fuzzy classifiers are all outperformed by this approach. A random forest-based IDS's false alarm rate and accuracy were also assessed in [27]. Also considered were data from GPRS, NSL-KDD, and UNSW-NB15. This classifier is put up against others like Multilayer Perceptrons [28], NBTrees [29], a Random Tree ensemble [30], and Nave Bayes [31]. Study indicated that random forest-based IDSs beat other classifiers in terms of performance. Scan attacks, DoS attacks, and MITM subsets of ordinary traffic were analyzed by Farahnakian et al. [31]. The combined DoS, scans, Mirai, and MITM assaults that were included in our analysis were not investigated for intrusion activities. A different study used a multistage classification technique based on clustering and oversampling [1320] to forecast whether or not the intrusion would occur.

2.1. Deep Learning-Based Intrusion Detection System

Commercial NIDS uses statistical measures or calculated thresholds to represent packet length, interarrival time, flow size, and other network traffic metrics [32]. False positive and false negative alarms are frequent occurrences. False negative notifications suggest that the NIDS is less likely to detect attacks. In contrast, many false positive alerts show that the NIDS is more likely to warn even when no attack has occurred. Commercial solutions are ineffective because of today's threats [3338].

A self-learning is a powerful tool for confronting today's threats. Unsupervised and semisupervised machine learning techniques are used to analyze different normal and malicious processes utilizing a vast corpus of regular and attack network and host-level events. Commercial viability for machine learning-based solutions is still in its infancy, but the literature on the topic is beginning to emerge. Current machine learning approaches have a high percentage of false positives and a high computational cost [39]. Machine learning classifiers can learn about basic TCP/IP features because of the localization of these features. TCP/IP information is sent through numerous hidden layers to create hierarchical feature representations and hidden sequential links in deep learning. Deep learning has dramatically improved AI operations such as image processing, audio identification, and natural language processing [40]. As a result of its capability to learn new, previously unknown patterns from raw data, deep learning is often used in cybersecurity. To discover more complex traits, it employs a sequence of adjustments. Classification, picture identification, self-driving cars, and speech recognition are just some of the problems that deep learning and large datasets are being utilized to solve. Unknown layers are used to automatically choose features or mining properties and then execute training and testing on the given dataset to acquire classification results. In contrast to conventional machine learning, deep learning does not initially require the extraction of features, as is the case with regular machine learning. Various methods for deep learning are available, for example autoencoder. A support vector machine is used to learn features based on stack autoencoders rather than a Softmax in the STL-IDS architecture introduced in [41,42]. SVM outperformed Naive Bayes, random forest, and J48 on the NSL-KDD dataset with respect to classification accuracy and training and testing durations. Recurrent neural networks were employed by H. Luo et al. [43] in order to detect intrusions (RNN). 83.28% of the time, they got it right. The active deep learning system proposed by O Ludtke et al. [44] is a self-taught (STL) technique for learning features and dimensions. The sparse autoencoder device can be used to reshape a unique feature illustration in an unsupervised manner. SVM is being used to increase the study's classification accuracy and speed. The two- and five-category classifications are likewise shown to have upright computations. J48, Naive Bayesian RF, and SVM have a lower precision rate in five-category classification than the SVM technique. M. Ahmed et al. [45] created a deep learning conjecture using feature extraction to build an IDS deep learning model. GRUs, MLPs, and Softmax modules were all part of the neural system he demonstrated for detecting intrusions, among other things. The investigation used both KDD and NSL-KDD datasets. According to this study, the KDD 99 and NSL-KDD datasets were better served by utilizing BGRU and MLP together. For example, convolutional neural systems and autoencoders have been extensively investigated by Bansod et al. [46]. Keras and Theano backends were used to train the model on a GPU-based test platform. Several organizational measures were used in this study, including the recipient working attribute, the area under the arc, the precision-recall curve, the mean average precision, and the classification accuracy.

3. Preliminaries

3.1. Autoencoder

Multilayer neural networks known as “autoencoders” provide the same output as their inputs with minimal reconstruction error since the output is similar to the input and has a small number of minimized variances. Unsupervised learning is used by the autoencoder to decode or reassemble the encoded output. Data may be reduced in dimension, features can be extracted, images can be compressed, and noise can be reduced by using an autoencoder. To keep things simple, we describe the general construction of an autoencoder without diving into specifics. Figure 2 gives the block scheme of the autoencoder.

The four major components of a general autoencoder are the encoder, bottleneck, decoder, and reconstruction loss. Data from the input are further compressed using an encoder, which helps to reduce the number of features the model must deal with. The bottleneck is the layer of input data that has the most compressed data with the lowest features. Using a decoder, a model is able to decode the encoded representation and verify that output and input are exactly alike. Finally, the term “reconstruction loss” refers to the difference between the output of a decoder and the original input while evaluating its performance. In addition, backpropagation is used for training and to further minimize reconstruction losses. The purpose of AE is to achieve this minimum loss. Compression of the input x into is achieved via the encoder function E. The decoder D will attempt to recreate the input as . The difference between the encoded and decoded vectors is the reconstruction loss in this case. Reconstruction loss can be measured using the mean squared error (MSE) technique:

Using Kullback–Leibler (KL) divergence, variational autoencoders (VAEs) may calculate reconstruction loss. Data in the latent space and data projected into the latent space have different probability distributions, which the KL divergence measures. This nonnegative number indicates the degree to which the two distributions differ.

There are a variety of autoencoders, such as denoising, variational, convolution, and sparse autoencoders.

3.2. Deep Neural Network

We proposed an MLP model technique since biological neural network features influence it. An MLP known as a feedback neural network is represented as inputs that can be passed from one node to another using a loop in the system. In mathematical terminology, each layer of the MLP model contains a significant number of neurons or units. Three or more layers, each with one or more hidden layers, make up this model, including an output layer. The number of hidden layers may be determined using a hyper-parameter selection strategy. Neural connections between layers allow information to move from one layer to the next. In mathematics, the MLP is defined as O: , where m is the size of the input vector , and N is the size of the output O(x) vector, which is a function of x. Each of the layers can be computed as follows:where the size of the input is denoted by the variable , and the nonlinear activation function is denoted by the variable f, which can be either a sigmoid (with values in the range [0, 1]) or a tangent function (values in the range [1, -1]). Figure 3 shows the deep neural network architecture.

4. Proposed Framework

This research proposes a multilayer classification strategy for detecting both the presence of an intrusion and the type of intrusion in the Internet of Things networks under the assumption of an unbalanced type of data. Training and testing datasets are separated, and the proposed method is implemented. The core of the proposed intrusion detection framework consists of preprocessing, autoencoding, databases, classification, and feedback modules. These diverse functional modules are maintained to construct a practical intrusion detection framework with high accuracy and low training complexity. The colored lines in Figure 4 show these functions: the black line is for detection, orange is for retraining, and green is for restoration. Blue two-way lines depict processes that cross with other functions. Figure 4 presents the architecture of proposed framework.

The Softmax function is the nonlinear activation function in our MLP model for the classification problem of multiclass. Each class's probabilities are output of the Softmax function, which selects the biggest value among the probabilities to provide a more accurate result for each class. All three activation functions' mathematical formulas are given below:where input is defined as x.

Multiclass logistic regression is the same as a three-layer MLP with a Softmax function in the output layer. In broad terms, MLP for a large number of hidden layers is formulated as follows:

In order to enhance deep learning efficiency, our method is distinguished by its modeling of loss functions and ReLU, which are discussed in detail below.

4.1. Preprocessing

Due to the fact that the training and testing datasets contain both numerical and nominal values, they are normalized. Every feature should be scaled the same while normalizing values. Our method takes into account all of the dataset's characteristics. As a result, each feature is essential.

4.2. Loss Functions

In order to get the most performance out of an MLP model, it is critical to choose an ideal parameter. As a first stage, this incorporates the loss function. The difference between the expected and actual values is calculated using a loss function, which is expressed as follows:where t stands for the desired value and p stands for the predicted value. Using p(pd) as the distribution of probabilities, multiclass classification uses the negative log probability with t as the target class:

To speed up the learning process, researchers have found that a technique known as the “rectified linear unit” (or “ReLU”) has a high level of proficiency. As a result of ReLU, the vanishing and exploding gradient problem is significantly reduced in the history of neural networks. Compared to the standard nonlinear activation functions like sigmoid and tangent [47], it is proven to be the most efficient way to train large datasets in terms of time and cost. As a result of this nonlinearity, neurons are referred to as [34]. ReLU is expressed as follows:where input is defined as x.

4.3. Autoencoder Training

The autoencoder is trained only on standard data packets (Figure 5). This method has various advantages. NSL-class KDD's imbalance can be overcome by training the AE exclusively on typical traffic. It enables the model to distinguish between legitimate and malicious data transmission as a secondary benefit. Thus, real-time applications like fog devices can be better served because we can immediately decide whether or not data transmission is normal or under attack. Figure 5 shows the normal data are used for training the autoencoder.

Dataset for developing an autoencoder; based on the label or class of each data packet sample, D was separated into normal and attack datasets, respectively.where is the “normal” dataset and is the “attack” dataset. On , we train the AE. The number of outputs generated by the AE is the same as the number of inputs; however, there is a loss in reconstruction for each xi. Attack data have a substantially larger reconstruction loss because the AE is only trained on “normal” data. An experiment led us to a point at which the value of reconstruction loss exceeded a certain threshold. An “attack” data point is defined as the one that has a reconstruction loss greater than the threshold value; otherwise, the data point is considered “normal.”

5. Results and Discussion

Experiments were carried out on NSL-KDD incursion data, a condensed form of KDDCup 99 data. It is possible to delete redundant connection records from the test data in KDDCup 99 by applying filters. The outcomes were obtained after implementing the multilayer technique. The studies were carried out on a personal computer with an Intel core i7-1065G7 processor and 1.30 GHz/16 GB of RAM, imbalanced-learn, Scikit Learn [48], and Keras [49]. To test the suggested concept, Python libraries were employed. The NSL-KDD dataset consists of 41 distinct features. Nominal, binary, and numeric features are subclasses. Nominal data cannot be used directly by an autoencoder.

All the input data must be in the form of a number. We used the deep multilayer classification approach to preprocess the nominal or category information. Using the MinMax Scaler functions, the remaining characteristics are preprocessed. As a result of this operation, the 41 characteristics were multiplied by 2. The autoencoder is then fed these features. The parameters of the autoencoder were kept to a minimum. For the first detection step, we use an autoencoder. A “dropout layer” was added to the autoencoder's input to prevent overfitting. This layer serves as a restriction on regularization. Autoencoding is prevented from replicating the input to create output using this input validation method. The dropout layer removes a random number of neurons from the input when training. Autoencoders have a single unnoticed hidden level. We found that the number of neurons in this hidden layer had a significant impact. Low precision is caused by a reduction in reconstruction error due to more neurons. The model's accuracy is also affected by the number of neurons in the system. According to our findings, neurons in the range of 4 to 10 in the hidden layer produce the best results. An “attack” is defined using a threshold value. There is a difference between an attack and a typical instance based on reconstruction error. We used model loss across training data instead of validation data to arrive at this result. Figure 6 shows that reconstruction error and neuron count are correlated. Figure 7 denotes the loss vs epoch during training and testing process using AE. Figure 8 presents the overall performance accuracy evaluation of the system using AE. Figure 9 gives the graphical representation of loss vs epoch during training and testing process using Deep MLP. Figure 10 shows the overall performance accuracy evaluation of the system using deep multilayer network.

Inputs: X - input dataset,
 Subsampling size
Output: Reconstruction loss for anomaly test data
Step 1: Initialize data = { };
Step 2:# Initializing a MinMax Scaler
 scaler = MinMaxScaler()
Step 3:# Instantiating the Autoencoder
 model = Autoencoder()
 # creating an early_stopping
 early_stopping = EarlyStopping(monitor = 'val_loss',
  patience = 2,
  mode = 'min')
 # Compiling the model
 model.compile(optimizer = 'Adam',
  loss = 'mae')
Step 4: # mlp = Sequential() # initializing model
 # input layer and first layer with 50 neurons
 mlp.add(Dense(units = 50, input_dim = X_train.shape [1], activation = 'relu'))
 # output layer with softmax activation
 mlp.add(Dense(units = 5,activation = 'softmax'))
5.1. Comparison with Recent State-of-the-Art Techniques

An extensive amount of study has been done on intrusion detection due to its importance in today's cyber environment. Detecting incursions using machine learning has been done in several methods. Over NSL-KDD, our method scores among the top in terms of accuracy when identifying intrusions using standard machine learning and deep learning techniques. Table 1 reveals that autoencoder-based approaches outperformed the competition. NSL-KDDTrainC and NSL-KDDTestC datasets were used to test the procedures in Table 1.

6. Conclusions

Deep multilayer classification autoencoder-driven intelligent intrusion detection was proposed in this article. The NSL-KDD dataset was used as a baseline for the proposed IDS. The AE architecture was fed with the most important properties discovered by data-driven deep learning, which comprises a single hidden layer with 50 units (AE50). According to Table 1 and recent state-of-the-art, the suggested AE50 classifier was compared with deep and classical methods (Table 2). According to comparative results, the deep multilayer classifier outperformed all other approaches, with an accuracy of 96.70%.

A more accurate deep architecture, similar to NSL-KDD instances, will be built in the future to detect malicious assaults as they occur. For real-time analysis of big data, we want to look at how methodologies from [15,16] can be combined with the work we did here. This way, long-term learning, faster decision criteria, and less computational complexity can be used [50].

Data Availability

The datasets used to support the findings of this study are available from the authors upon reasonable request.

Ethical Approval

This article does not contain any studies with human participants. No animal studies were involved in this review.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

All authors contributed equally to this work. In addition, all authors have read and approved the final manuscript and gave their consent to publish the article.