International Journal of Aerospace Engineering

Research Article

Intercept Guidance of Maneuvering Targets with Deep Reinforcement Learning

Guidance law learning algorithm based on TD3 algorithm.

Initialize critic networks , and actor network
with random parameters
Initialize target networks
Initialize replay buffer
for t=1 to T do:
, reset the environment initial value
While True:
Select action with exploration noise, and
observe reward and new state .
Store transition experience tuple in
Sample mini-batch of N transitions from
,

Calculate the TD error

Calculate the loss function

Update the critic network using gradient descent as(where, e=1,2)

if t mod d then
Update by the deterministic policy gradient:


Update target networks:


end while
end for