International Journal of Intelligent Systems

Research Article

Flipit Game Deception Strategy Selection Method Based on Deep Reinforcement Learning

Multi-stage flipit game deception strategy selection algorithm based on proximal policy optimization.

	Input: The attacker’s state in cloud environment
	Output: Network update parameters , Deception strategy
(1)	Set parameters
(2)	Initialize Actor network parameters as , Initialize Critic network parameters as
(3)	for iteration = 1, 2, 3, …, do
(4)	Experience Pool
(5)	for k = 1, 2, …do
(6)	Randomly initialize state
(7)	Initialize state-action trajectory
(8)	for t = 1, 2, …, T do
(9)	Get current state
	//Obtain the strategy taken by the current attacker, i.e. select the exponential distribution strategy
(10)	Select action from the old Actor network
	//Select the interval of the defender’s control target system
(11)	Calculate the reward of multi-stage deception strategy
	//Calculate the reward according to formula (5)
(12)	Store state-action trajectory
(13)	end for
(14)
(15)	end for
(16)	Update Actor network policy parameter

(17)	Update Critic network policy parameter

(18)	Update the old network parameter:
(19)	end for