Research Article
A UAV Pursuit-Evasion Strategy Based on DDPG and Imitation Learning
Algorithm 1
The UAV pursuit strategy using DDPG.
Training algorithm for UAV strategy based on DDPG | initial experience pool D with memory size M | initial the Eval networks of Actor network and Critic network: and | forepisode=1 to MaxEpisodedo | initialize OU-Noise | initialize the state of pursuit-UAV and evasion-UAV in set range randomly, | obtain the initial state of simulation environment | fort=1 to MaxStepdo | select action of pursuit-UAV where is the action constraint processing process | select maneuver strategy for evasion-UAV | input the control signal into the UAV integrate to get the next state of UAV, and calculate the environment state | obtain the immediate reward from the environment | store experience sample in D | randomly sample form D to get a sample set of BatchSize | update the Eval network parameter of the Critic | update the Eval network parameter of the Actor | update the Target network parameters and of Critic network and Actor network by (20) | if the episode end condition is satisfied, break | end for | end for |
|