Research Article

A UAV Pursuit-Evasion Strategy Based on DDPG and Imitation Learning

Algorithm 1

The UAV pursuit strategy using DDPG.
Training algorithm for UAV strategy based on DDPG
initial experience pool D with memory size M
initial the Eval networks of Actor network and Critic network: and
forepisode=1 to MaxEpisodedo
 initialize OU-Noise
 initialize the state of pursuit-UAV and evasion-UAV in set range randomly,
obtain the initial state of simulation environment
fort=1 to MaxStepdo
  select action of pursuit-UAV where is the action constraint processing process
  select maneuver strategy for evasion-UAV
  input the control signal into the UAV integrate to get the next state of UAV, and calculate the environment state
  obtain the immediate reward from the environment
  store experience sample in D
  randomly sample form D to get a sample set of BatchSize
  update the Eval network parameter of the Critic
  update the Eval network parameter of the Actor
  update the Target network parameters and of Critic network and Actor network by (20)
  if the episode end condition is satisfied, break
end for
end for