Online Cyber-Attack Detection in the Industrial Control System: A Deep Reinforcement Learning Approach
Algorithm 1
Improved deep Q network algorithm.
ā
Require: Initialize the experience pool , the current value network, the target value network, and the Q network. Train data , label , the interval of the parameter replacement n, epoch and size.
ā
Require:
(1)
For in :
(2)
Select the initial environment .
(3)
For in :
(4)
Enter into the Q network to get the probability of each action. Select the action value corresponding to the maximum action probability.
(5)
Use greedy strategy to choose an action ;
(6)
Execute the action . The ICSDQN will enter the next environment , and reward will be given to the ;
(7)
Set ;
(8)
Store in the experience pool;
(9)
Randomly sample samples as training set from the experience pool;
(10)
Calculate the loss function and use the gradient descent algorithm to update the network parameters of the current value network by using ;
(11)
The parameters of the current value network are assigned to the target value network every times of training.