Research Article
Flipit Game Deception Strategy Selection Method Based on Deep Reinforcement Learning
Algorithm 1
Multi-stage flipit game deception strategy selection algorithm based on proximal policy optimization.
| Input: The attacker’s state in cloud environment | | Output: Network update parameters , Deception strategy | (1) | Set parameters | (2) | Initialize Actor network parameters as , Initialize Critic network parameters as | (3) | for iteration = 1, 2, 3, …, do | (4) | Experience Pool | (5) | for k = 1, 2, …do | (6) | Randomly initialize state | (7) | Initialize state-action trajectory | (8) | for t = 1, 2, …, T do | (9) | Get current state | | //Obtain the strategy taken by the current attacker, i.e. select the exponential distribution strategy | (10) | Select action from the old Actor network | | //Select the interval of the defender’s control target system | (11) | Calculate the reward of multi-stage deception strategy | | //Calculate the reward according to formula (5) | (12) | Store state-action trajectory | (13) | end for | (14) | | (15) | end for | (16) | Update Actor network policy parameter | | | (17) | Update Critic network policy parameter | | | (18) | Update the old network parameter: | (19) | end for |
|