International Journal of Aerospace Engineering

Research Article

A UAV Pursuit-Evasion Strategy Based on DDPG and Imitation Learning

The UAV pursuit strategy using DDPG.

Training algorithm for UAV strategy based on DDPG
initial experience pool D with memory size M
initial the Eval networks of Actor network and Critic network: and
forepisode=1 to MaxEpisodedo
initialize OU-Noise
initialize the state of pursuit-UAV and evasion-UAV in set range randomly,
obtain the initial state of simulation environment
fort=1 to MaxStepdo
select action of pursuit-UAV where is the action constraint processing process
select maneuver strategy for evasion-UAV
input the control signal into the UAV integrate to get the next state of UAV, and calculate the environment state
obtain the immediate reward from the environment
store experience sample in D
randomly sample form D to get a sample set of BatchSize
update the Eval network parameter of the Critic
update the Eval network parameter of the Actor
update the Target network parameters and of Critic network and Actor network by (20)
if the episode end condition is satisfied, break
end for
end for