Research Article

Network Security Defense Decision-Making Method Based on Stochastic Game and Deep Reinforcement Learning

Algorithm 1

Cyber adaptive defense countermeasures algorithm.
Input: ; learning rate ; reward discount factor ; exploration probability ; convergence accuracy ; stable duration ;
 Output: optimal defense strategy .
 Begin
(1)Initialization:
(2)Solve Bayesian Nash equilibrium:
(3)Network defense strategy revenue function:
(4)Get current network status:
(5)repeat:
(6)Select defensive actions through algorithm:
(7)Output
(8)Get a new network status:
(9)Update and learn Q according to the phased results:
(10)Update Bayesian Nash Equilibrium:
(11)
(12)
(13)Until
(14)Output
(15)End