Research Article

Flipit Game Deception Strategy Selection Method Based on Deep Reinforcement Learning

Algorithm 1

Multi-stage flipit game deception strategy selection algorithm based on proximal policy optimization.
Input: The attacker’s state in cloud environment
Output: Network update parameters , Deception strategy
(1)Set parameters
(2)Initialize Actor network parameters as , Initialize Critic network parameters as
(3)for iteration = 1, 2, 3, …, do
(4) Experience Pool
(5)for k = 1, 2, …do
(6)  Randomly initialize state
(7)  Initialize state-action trajectory
(8)  for t = 1, 2, …, T do
(9)   Get current state
    //Obtain the strategy taken by the current attacker, i.e. select the exponential distribution strategy
(10)   Select action from the old Actor network
    //Select the interval of the defender’s control target system
(11)   Calculate the reward of multi-stage deception strategy
    //Calculate the reward according to formula (5)
(12)   Store state-action trajectory
(13)  end for
(14)  
(15)end for
(16) Update Actor network policy parameter
(17) Update Critic network policy parameter
(18) Update the old network parameter:
(19)end for