Research Article

Intercept Guidance of Maneuvering Targets with Deep Reinforcement Learning

Algorithm 1

Guidance law learning algorithm based on TD3 algorithm.
Initialize critic networks , and actor network
with random parameters
Initialize target networks
Initialize replay buffer
for t=1 to T do:
  , reset the environment initial value
  While True:
   Select action with exploration noise, and
  observe reward and new state .
   Store transition experience tuple in
   Sample mini-batch of N transitions from
    ,
   
   Calculate the TD error
   
   Calculate the loss function
   
   Update the critic network using gradient descent as(where, e=1,2)
   
   if t mod d then
    Update by the deterministic policy gradient:
    
    
    Update target networks:
    
    
  end while
end for