Abstract
Aiming at the network security problem of power system cable trench control industrial Internet system, we studied an intrusion detection method applied to the embedded industrial Internet of Things gateway. This method extracts rules from the DBN-DNN deep neural network to obtain intrusion detection models that are conducive to integration into embedded systems. We first use the DBN network to reduce the dimensionality of the data, then use the DNN to train the classification model, and extract the rules from the DNN’s neurons to form a rule tree for intrusion detection. The KDD CUP99 training database is used to verify the feasibility of the method, and the test is carried out in the embedded gateway. The results show that the detection method based on rule extraction used in this paper can ensure detection efficiency and accuracy compared to the traditional detection methods. At the same time, it saves more computing resources and is more conducive to integration in embedded gateway systems.
1. Introduction
With the advent of the information age, the application of the industrial Internet has become increasingly popular, and increased security issues have followed [1]. The wide application of network interconnections in production and manufacturing not only brings convenience and efficiency to production, but also many hidden dangers to security. At present, the security of the Industrial Internet mainly involves six major security issues: equipment security, control security, network security, identification resolution security, platform security, and data security [2]. Among them, the security of platforms and data is a new issue that needs to be urgently addressed for industrial Internet security. Once a platform and data are compromised and attacked, it can cause serious damage and loss to the entire production system. Intrusion detection is an effective network attack detection method, which is widely used in the field of network security detection [3]. The traditional intrusion detection technology is divided into misuse-based intrusion detection technology (MIDS) and anomaly-based intrusion detection technology (AIDS). Among them, intrusion detection technologies related to machine learning include an intrusion detection technology based on autoencoder [4], intrusion detection technology based on deep learning [5–7], intrusion detection technology based on reinforcement learning [8], and an intrusion detection technology based on visual analysis [9]. Detection methods based on deep learning can process large-scale network traffic data more effectively, and have a higher detection efficiency and accuracy [10]. The performance of DBN-based detection technology in network intrusion detection is widely recognized by researchers, but due to the complex training process of deep neural networks, poor model interpretability, and high requirements for equipment computing power [11], industrial Internet security is different from traditional Internet security and industrial Internet is a real-time requirement closely related to the production. In the network, if there is a problem, it will not be solved in time, which will cause heavy losses [12]. The breadth of the Industrial Internet determines that it will not be equipped with Internet gateways with high computing power, regardless of economic or practical considerations, which makes it difficult for traditional network detection methods to be directly applicable to complex industrial Internet systems [13]. There are also some research studies on the security detection methods for embedded systems. Some of them study the optimization of embedded system architecture and try to optimize the security detection performance by adding complex detection devices, which considerably increases the deployment cost of embedded systems. The other part is to study the optimization of energy consumption and memory utilization in the implementation of security detection methods for embedded systems, which can achieve better security detection results by saving computational energy consumption and using the limited memory cells of the embedded systems efficiently. Despite the impact of these studies, most of them are still at the theoretical stage and there are still some issues to overcome in the actual deployment of the embedded systems. Reference [14] uses a method to detect malicious network traffic by extracting learning rules from neural networks. This method reduces the computational requirements during the detection process, and may be more beneficial to devices with small computational resources. In the current research study on industrial network security, it is very necessary to study the security detection methods that are more conducive to the deployment of embedded systems. Based on existing research studies, this study studies a security detection method that is more conducive to implementation in the embedded systems. The contribution of this research is as follows:(i)Aiming at the application of traditional security detection technology in the embedded gateway of industrial Internet, due to the limited hardware computing resources, difficulties in deployment, and weak detection capabilities, this paper proposes a solution to save computing resources while ensuring the detection effect.(ii)In order to make the security detection technology easier to deploy to the embedded gateway, this paper proposes a detection method that first extracts rules from the DBN-DNN network, and then deploys the rules to the embedded gateway, and designs an experimental scheme to verify it.(iii)The detection technology based on the deep network rule extraction is tested on the KDD CUP99 dataset, which proves the feasibility of the method. The results show that the method guarantees an excellent detection effect when converting complex neural networks into logical calculations that is more conducive to the implementation in embedded systems.
2. Related Work
Existing research studies show that the machine learning methods have been widely used in the study of security detection methods as machine learning techniques have been developed. This section introduces the work related to the method studied in this paper in the following two parts: the security detection method based on machine learning and the security detection method of embedded IoT gateway with limited resources.
2.1. Safety Detection Method Based on Machine Learning
Machine learning-based security detection methods include artificial neural networks, association rules, and fuzzy association rules, Bayesian networks, clustering, decision trees, integrated learning, evolutionary computing, hidden Markov models, inductive learning, Naive Bayesian, sequential pattern mining, and support vector machines. Reference [15] proposes a network intrusion detection method based on deep learning. The deep confidence neural network is used to extract the features of the network monitoring data, and the BP neural network is used as the top-level classifier to classify the intrusion types. The verification results show that this method is significantly improved compared with the traditional machine learning methods. Reference [16] proposes an integration method to improve the detection performance by simultaneously constructing numerous independent decision trees for different subsets in different parts of the training samples. This method improves the accuracy of the method by combining numerous decision trees for final judgment because any decision tree in the random forest is different, and the variance is reduced, which makes the method have a strong generalization to avoid problems such as dataset imbalance and overfitting. Reference [17] proposes a hybrid detection framework that depends on data mining classification and clustering technology. The random forest classification algorithm is used to automatically build intrusion patterns from the training dataset, and then the K-Means clustering algorithm is used to detect new intrusions. This method detects the intrusion of one or more clusters by clustering the detection data. Reference [18] proposed a hybrid network-based high-efficiency model (HNIDS) using the enhanced genetic algorithm, particle swarm optimization (EGA-PSO), and the improved random forest (IRF) method. This method uses the mixed EGA-PSO method to enhance the secondary sample. By adding multiobjective functions to select the best features and to achieve improved fitness results, the decision tree list is merged in each iteration process, thereby effectively preventing the overfitting problem.
2.2. Security Detection Methods for Embedded Systems
Research studies on security detection techniques for industrial Internet-embedded systems have also yielded various results in the recent years. Reference [19] proposed a multicore-based detection architecture for real-time embedded systems, which is related to a novel monitoring technology. The security of the real-time embedded system can be improved by analyzing and observing the inherent property of the real-time system, and detecting malicious activities through statistical analysis of its execution. Reference [20] analyzes the detailed energy of the feature extraction engine and the three machine learning classifiers are implemented in decision tree (DT), Naive Bayes (NB), and k-nearest neighbors (KNN) in the embedded system security detection technology. It hopes to propose more energy-saving detection methods. Reference [21] proposes a detection technology optimized for embedded system memory, which maximizes the coverage of security attributes relative to available memory, and can be applied to various embedded devices with different memory capacities. It provides a strong detection rate even when memory is limited. Reference [22] proposes an integrated security detection method for Internet of Things (IoT) devices, which combines multiple classifiers to find an accurate classifier. An integrated classification model using automatic model selection is proposed. The model uses a large number of classifiers with different configurations, and the model is evaluated and selected by an ensemble metric. Reference [23] proposes an intrusion detection system based on machine learning to detect the IoT network attacks. A new layered intrusion detection system is proposed for the backbone network of the IoT using a two-layer dimension reduction engine and a two-layer classification engine. After dimensionality reduction via component analysis and linear discriminant analysis units, a Naive Bayesian classifier is used to classify the attack records via the k-nearest algorithm. Experiments show that the proposed method has excellent detection performance in hard-to-capture attacks. While these methods have improved the detection methods, to a certain extent, by making them more suitable for embedded systems, they have some shortcomings in terms of detection accuracy and efficiency.
3. Method Principle
Embedded networks have always occupied a large proportion of the current industrial Internet applications. As an essential information forwarding unit in the embedded networks, embedded gateways are the main equipment to ensure the embedded network security. However, from the perspective of cost, most embedded gateways are not equipped with high computing power, which makes it difficult for the traditional security detection technology to obtain a perfect representation of the embedded gateways. It is of great significance for the development of the industrial Internet to find a detection technology that can ensure the detection efficiency and is more conducive to integration into the embedded gateway. In the traditional detection technology, the intrusion detection technology based on the deep neural network has a better detection performance, but because of the complex structure of the deep network, poor interpretability of the detection process, and the high computing requirements, it is difficult to adapt to embedded gateways. To improve the real-time performance of the detection technology and to facilitate the transplantation of embedded gateways, a detection technology based on the DBN-DNN network rule extraction is proposed. First, a high-precision detection model is trained by the DBN-DNN network, and then the idea of the decision trees is used to extract the detection rules from the DBN-DNN network to obtain a rule model that is easier to integrate into the embedded gateway. The specific process in the method is shown in Figure 1.

4. Detection Model Based on DBN-DNN
4.1. DBN-DNN Neural Network
First, we build a deep neural network training intrusion detection model based on the DBN-DNN structure. DBN is a kind of restricted Boltzmann machine (RBM) composed of multiple generative neural network structures, which is trained by the contrastive dispersion algorithm (CD) [24]. The extraction and dimensionality reduction of the RBM has excellent learning ability. The RBM training process is shown in Figure 2.

In the figure, is the value vector of the neurons in the hidden layer, is the value vector of the neurons in the visible layer, is the paranoid vector of the hidden layer, is the bias vector of the visible layer, and is the weight matrix, which can connect the function to the given state. The probability distributions of and are shown in formulas (1) and (2), respectively.
In the formula, is the normalization coefficient, and the DNN network layer is added after the last layer of the RBM. The DNN layer takes the dimensionality reduction feature output by the RBM as the input vector, and uses the backpropagation algorithm (BP) for fine-tuning and for supervised training entity relationship classifier. This network structure is called the DBN-DNN network structure [25], and the network structure is shown in Figure 3.

4.2. Feature Dimensionality Reduction Based on DBN
There is a training dataset D = ((X1, Y1), (X2, Y2)… (Xn, Yn)), where is the number of samples in the dataset, and if each sample inputs features, then represents the h-th feature of the test data samples. We build an m-layer RBM network to reduce the dimensionality of the input sample features, let the number of features after RBM dimensionality reduction be , and we use equation (3) to initialize the number of feature vectors of the RBM output layer, and then update the output of the (m − 1)-th layer RBM according to equation (4), that is, the numbers of features.
In equation (3), is the feature dimensionality reduction ratio, in equation (4), is the rounding function, and so on to update the number of output features of each RBM layer. In order to obtain a better feature dimensionality reduction effect, the particle swarm algorithm is used to optimize the RBM layer structure, and a fitness evaluation function based on the average reconstruction error of the dimensionality reduction model is constructed to optimize the number of RBM layers.
4.3. DNN-Based Entity Classifier
After optimization, the number of output features of the m-th layer RBM is , and the output of the m-layer RBM is used as input to build a DNN entity classifier. The input layer of the classifier has h0 neurons, and the output layer is the corresponding detection type. Using the ReLU function as the activation function, it adopts the backpropagation algorithm to adjust the weight, and the activation function is shown in equation (5):
In the formula, when the input is less than 0, the output is 0, and when the input is greater than 0, the output is the same as the input. The loss function is selected as shown in equation (6):where and are vectors with the same feature dimension as the output, and is the L2 norm of . After selecting the loss function, the gradient descent method is used to iteratively obtain the weights of each layer. The structure of the DNN entity classifier is shown in Figure 4.

5. DBN-DNN Network Rule Extraction
Extracting rules from DBN-DNN is divided into three steps. First, we obtain the neuron output of the hidden layer, then extract the hidden layer to the output layer to generate a rule tree between the input layer and each hidden layer. Finally, since the output of the input rule tree is the input of the output rule tree, the output of the input rule tree is used as the input of the output rule tree to construct a complete rule tree detection model.
5.1. Decision Tree-Based Rule Extraction
The rule extraction method is based on the output of each hidden layer neuron. We assume that the output feature of the RBM layer is , the detection result is , and the number of hidden layers of the DNN is j. Each hidden layer contains k neurons, where denotes the k-th neuron in the j-th hidden layer. is a sample of the dataset, and is the detection type corresponding to sample . Assuming that has n eigenvalues, the output of the k-th neuron in the j-th hidden layer is as shown in equation (7):where is the connection weight of the m-th neuron and is the number of neurons in the hidden layer of . is the input of the m-th neuron, and is the threshold. We then calculate the output mean of the j-th hidden layer neuron as shown in equation (8):where is the number of neurons in the j-th hidden layer. At this time, the average value of neurons in each hidden layer corresponding to can be obtained, and these average values can be used to establish decision rules.
5.2. Input Rule Tree Model
The function of the input rule tree is to extract and describe the rules between the input features and the hidden layer of the neural network. The number of input rule trees depends on the output features of the m-th layer of the RBM and is equal to the number of hidden layers of the trained DNN. Let be an m-dimensional feature vector, and the output mean of its corresponding j-th hidden layer is , and the k-th variable in is used as the segmentation variable and the segmentation point, and the defined regions and are shown in equations (9) and (10):
Then, the optimal segmentation variable and segmentation point are obtained, and the optimal value is calculated as shown in equation (11):
In the formula, and are the output values of the two regions after division, which are the values with the smallest square error in the respective regions. The calculation process is shown in equations (12) and (13):
In the formula, is the number of samples divided into the region , and is the number of samples divided into the region . After finding the optimal segmentation point, the input space is divided into two regions in turn, and then the abovementioned division is repeated for each region. The process is repeated until the stopping condition is met, and finally a least squares regression tree is generated.
5.3. Output Rule Tree Model
The function of the output rule tree is to extract and describe the rules between the hidden layer of the neural network and the output of the neural network. After obtaining the output mean of the hidden layer neurons, the decision tree is used to establish the rules between the hidden layer mean and the output detection type. First, the empirical entropy of the hidden layer mean of the sample and the output detection type is calculated, let the output mean vector of the hidden layer corresponding to be , is a j-dimensional vector, and j is the set number of hidden layers. The calculation process is shown in equation (14):
In the formula, n represents the total number of types of detection results, represents the proportion of the i-th type of detection results, and log is the logarithm of base 2 or e. We use the dichotomy method to build a rule tree, set the total number of samples to be N, divide the output mean of the j-th hidden layer of all samples into and , set the boundary value to , and calculate the as shown in equation (15):
Then, the information gain of different for different detection results of the dataset is calculated, respectively, and finally, the optimal demarcation point with the largest information gain is selected to establish the output rule tree model in turn.
5.4. Whole Rule Tree Model
After obtaining the input rule tree and the output rule tree through training, the input rule tree and the output rule tree are combined into a complete rule tree model. The input rule tree is used to describe the learning process between the input layer and the hidden layer of the neural network, and the output rule tree is used to describe the rules between the hidden layer and the output of the neural network. The detection result of the input rule tree is used as the input of the output rule tree to build the model. The number of input rule trees depends on the number of hidden layers of the training neural network. Each input rule tree is a description of the learning process between the input feature and a hidden layer of the neural network. After obtaining the description between the input feature and all hidden layers, the final detection result is obtained by outputting the rule tree as fresh data, and the input of the output rule tree is the output of all the input rule trees.
6. Experimental Design
In order to verify the feasibility of the proposed method, the KDD CUP99 dataset is used to design experiments to analyze the detection effect of the rule tree detection model. The original data of the KDD CUP99 dataset comes from the DARPA Intrusion Detection Evaluation Project in 1998. The dataset contains 500 10,000 training data and two million test data.
6.1. Experimental Environment and Implementation
The experimental environment is Windows 10 64 system, CPU frequency is 3.6 MHz, memory is 16G, graphics card is GTX1050ti, and graphics card memory is 4G. The software environment includes the open-source machine learning platform TensorFlow and the free machine learning library scikit-learn. The experiment uses Python as the programming language. In the experiment, the data is first processed, and then a deep neural network model is built in TensorFlow to obtain the training process data. Then, a rule tree is built through scikit-learn to extract the rule descriptions of the learning process from the input layer to the hidden layer and from the hidden layer to the output layer from the deep neural network process data, and finally a complete detection rule tree is established.
6.2. Datasets and Data Preprocessing
Anomaly types in the KDD CUP99 dataset are subdivided into 4 categories with a total of 39 attack types [26], of which 22 attack types appear in the training set, and another 17 unknown attack types appear in the test set. In the experiment, the character features are converted into numerical features, and then the feature values are standardized. First, the average value and average absolute error of each attribute are obtained. Let the k-th attribute of the i-th sample be , then the sample k-th and the mean value of the attribute is calculated as shown in equation (16), and the mean absolute error is calculated as shown in equation (17):
In the formula, N is the total number of samples, and the standard value of after normalization is set as . The calculation process of is shown in equation (18).
During the calculation, if any one of and is 0, then the value of is also 0. After obtaining the standard value r, the data is normalized. Let be the value after normalization. The calculation process of is as shown in equation (19):where is the minimum value in , and is the maximum value in . 10% of the data in the KDD CUP99 dataset was selected for the experiment, with a total of 494,021 sample records. Table 1 shows the mean absolute error (MAE), standard deviation (SD), skewness of the first 8 sample features in the dataset after the character features are digitized (SKEW), and kurtosis (KURT).
It can be seen from Table 1 that the distribution interval of the sample features is large, and the experimental data is standardized. Table 2 shows the parameters of the first 8 sample features in the dataset after the dataset is standardized.
It can be seen from Table 2 that after the sample features are standardized, the distribution interval is significantly reduced, and the standardized data is used for deep model training and rule extraction.
6.3. DBN-DNN Network Model
After the data standardization is completed, the standardized samples are used to build a DBN-DNN deep network training detection model for feature extraction.
6.3.1. DBN Structure Design
Different RBM layer designs have a certain influence on the extraction effect of the sample features during DBN model training. The particle swarm algorithm is used to optimize the number of DBN layers and the number of neurons in each layer to obtain the optimal model parameters. The feature dimensionality reduction ratio is set to 3 times, and the number of DBN layers is an integer between 2 and 4 to simplify the model training. The number of neurons in each layer is updated as shown in equation (20):where h is the total number of input sample features, is the number of neurons, and the adjustment parameter ranges from 0 to , m is the number of DBN layers, is the number of output features after dimensionality reduction, and the numerical calculation of is shown in equation (21):
The fitness update function is designed with the reconstruction error of each layer of the DBN, and the model parameters with the smallest mean reconstruction error are obtained by updating when m is set differently. After many experiments, when the number of RBM layers is set to 3 and the neuron adjustment parameter is set to 7, the DBN network has a better effect. At this time, the number of neurons in the input layer of the RBM is 34, and the number of neurons in the output layer is 12. Figure 5 shows the RBM reconstruction error from the input layer to the hidden layer and from the hidden layer to the output layer. During training, the parameters are updated every 300 samples, and the number of iterations is 100.

In the figure, h1 is the reconstruction error from the RBM input layer to the hidden layer, and h2 is the reconstruction error from the RBM hidden layer to the output layer. It can be seen from the figure that the reconstruction error is almost 0.0001 when the iteration reaches 40 times. At this time, the model has a better feature extraction effect.
6.3.2. DBN-DNN Model Training
In order to reduce the complexity of the rule extraction process in the experiment, a DNN network with one hidden layer is designed for the model training. Table 1 shows the total number of various attack identifiers in KDD CUP99, and the randomly selected ones from the different attack types during model training, that is, the number of training and testing samples.
As shown in Table 3, there are 18,000 training samples and 10,000 test samples. The number of iterations of the DNN network during training is set to 20. Figure 6 shows the results after 20 training sessions.

It can be seen from the figure that the model error is almost close to 0.001 after 20 sessions of training, and the trained model is saved for rule extraction.
6.4. Model Rule Extraction
After training the DBN-DNN model, a decision tree is built to extract the model rules. First, we calculate the output mean value of each hidden layer of the neural network, then establish an input rule tree that describes the characteristics of the input samples and the rules between the hidden layers of the neural network, and establish an output rule tree that describes the relationship between the hidden layer and the output of the neural network, and finally combine the input rule tree and the output rule tree to obtain a complete detection rule tree.
6.4.1. Building an Input Rule Tree
The output result is the feature of the neural network and the output mean of each hidden layer, and the output result is the predicted value of the output mean of each hidden layer corresponding to the input feature, so the input rule tree is a regression decision tree. The number of trees depends on the number of hidden layers of the deep neural network. Each rule tree uses the neural network input as the training sample for feature selection and division. Since the neural network selected in this experiment to reduce the complexity only contains one hidden layer, so only one input rule tree needs to be trained. The number of training samples is 18,000. Table 3 shows the sample types in detail. During training, we select the mean absolute error as the criterion for selecting features and splits. The input rules obtained after training the tree model are shown in Figure 7. It can be seen from the figure that all input samples have been divided, and the model selects the second, third, fourth, and eighth features of the input samples as the optimal segmentation features to establish a regression decision tree, and the leaf nodes of the tree are the hidden layer of the neural network corresponding to the input sample that outputs the mean predicted value, and the output of the input rule tree is also used as the input value of the output rule tree.

6.4.2. Building an Output Rule Tree
The output rule tree is used to extract the rules between the hidden layer and the output layer of the deep neural network. The input of the output rule tree is the average output value of each hidden layer of the neural network corresponding to the sample, and the number of input values is equal to the number of hidden layers of the neural network. The output value of the output rule tree is the detection result of the deep neural network, so the output rule tree is a classification decision tree. Since the experiment uses a neural network with a hidden layer, the output rule tree contains only one input value. We take the output mean of the neural network hidden layer of the corresponding sample as the input sample, and use the neural network output detection type as the label to build a classification decision tree, select the information entropy as the measurement standard, and use the sample parameters listed in Table 3 to build the output rule tree, and the trained output rule tree is shown in Figure 8.

The leaf node of the rule tree in the figure is the five detection results of the classification, and it is also the output value of the deep neural network. After the output rule tree is established, the output of the input rule tree is used as the input of the output rule tree to establish a detection rule tree and then to verify the detection results.
6.5. Experiment Analysis
To verify the detection effect of the rule tree, the deep neural network is used in the experiment and the established detection rule tree was used to detect the sample data. Each time, 18,000 samples were randomly selected from the experimental dataset according to the structure of the train in Table 3 as a group. A total of 50 groups were selected for the experiment. Figure 9 shows the detection results of a group of random data.

As can be seen from the figure, both the rule tree model and the deep neural network can better detect various attack types. Figure 10 shows the detection accuracy of the two methods.

It can be seen from the figure that the detection effects of the two detection methods are almost similar. To compare the detection effects of the deep neural network and the detection rule tree more clearly, the precision, recall, and the F-Measure of the 50 sets of test data were calculated, respectively. The mean values are compared, and the calculation results are shown in Table 4.
From the table, it can be further concluded that the detection result of the rule detection model is almost close to that of the deep neural network. By combining Figure 10 and Table 4, it can be seen that the detection accuracy of the rule model is almost close to the detection accuracy of the neural network. At the same time, it is affected by the detection accuracy of the neural network. The detection effect of the detection model will vary with the detection effect of the neural network. Although its detection effect is slightly lower than that of the neural network, it converts complex mathematical operations into logical judgments that are easier to implement in the embedded systems and thus more conducive to real-time detection.
7. Conclusion
Aiming at the network security problems existing in the industrial Internet control system of substations, this paper proposes a security detection method for embedded industrial IoT gateways based on the deep neural network rule extraction. By extracting the rules of the DBN-DNN deep neural network model, as based on the rule tree security detection model, the model converts the complex calculation in the neural network into a logical judgment that is easier to implement in the embedded system, saves the detection cost while ensuring the detection accuracy, and in addition improves the detection efficiency. By using KDD CUP99 to verify, the results show that the detection effect of the detection method based on the rule extraction proposed in this paper is nearly close to the detection effect of the neural network of the extracted rules, and will improve with the improvement of the neural network detection effect. The extracted detection models are easier to understand and implement than the deep neural networks, and are more conducive to integration in the embedded systems. Subsequent work will be carried out to improve the detection accuracy of the deep neural networks. By comparing different network structures and training methods, a network model with higher accuracy will be obtained, and the rule extraction method will be continuously improved.
Data Availability
The dataset we used in this paper is available at https:// https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Readers who are interested in our research can access the dataset and reproduce our results.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation for the Key Research and Development Program of Gansu Province, China (Grant no. 20YF3GA016).