Research Article

Online Cyber-Attack Detection in the Industrial Control System: A Deep Reinforcement Learning Approach

Algorithm 1

Improved deep Q network algorithm.
 Require: Initialize the experience pool , the current value network, the target value network, and the Q network. Train data , label , the interval of the parameter replacement n, epoch and size.
 Require:
(1)For in :
(2)Select the initial environment .
(3)For in :
(4)Enter into the Q network to get the probability of each action. Select the action value corresponding to the maximum action probability.
(5)Use greedy strategy to choose an action ;
(6)Execute the action . The ICSDQN will enter the next environment , and reward will be given to the ;
(7)Set ;
(8)Store in the experience pool;
(9)Randomly sample samples as training set from the experience pool;
(10)Calculate the loss function and use the gradient descent algorithm to update the network parameters of the current value network by using ;
(11)The parameters of the current value network are assigned to the target value network every times of training.