Research Article
Intercept Guidance of Maneuvering Targets with Deep Reinforcement Learning
Algorithm 1
Guidance law learning algorithm based on TD3 algorithm.
Initialize critic networks , and actor network | with random parameters | Initialize target networks | Initialize replay buffer | for t=1 to T do: | , reset the environment initial value | While True: | Select action with exploration noise, and | observe reward and new state . | Store transition experience tuple in | Sample mini-batch of N transitions from | , | | Calculate the TD error | | Calculate the loss function | | Update the critic network using gradient descent as(where, e=1,2) | | if t mod d then | Update by the deterministic policy gradient: | | | Update target networks: | | | end while | end for |
|