Mathematical Problems in Engineering

Research Article

Online Cyber-Attack Detection in the Industrial Control System: A Deep Reinforcement Learning Approach

Improved deep Q network algorithm.

	Require: Initialize the experience pool , the current value network, the target value network, and the Q network. Train data , label , the interval of the parameter replacement n, epoch and size.
	Require:
(1)	For in :
(2)	Select the initial environment .
(3)	For in :
(4)	Enter into the Q network to get the probability of each action. Select the action value corresponding to the maximum action probability.
(5)	Use greedy strategy to choose an action ;
(6)	Execute the action . The ICSDQN will enter the next environment , and reward will be given to the ;
(7)	Set ;
(8)	Store in the experience pool;
(9)	Randomly sample samples as training set from the experience pool;
(10)	Calculate the loss function and use the gradient descent algorithm to update the network parameters of the current value network by using ;
(11)	The parameters of the current value network are assigned to the target value network every times of training.