International Journal of Aerospace Engineering

Research Article

Two-Loop Acceleration Autopilot Design and Analysis Based on TD3 Strategy

TD3 algorithm.

TD3: Twin Delayed Deep Deterministic Policy Gradient
Randomly initialize the network parameters , and
Initialize the target network parameters
Initialize the replay buffer R
for episode = 1, M do
Initialize an exploration noise for action exploration
Receive the initial environmental state quantity
for t = 1, T do
Select an action according to the current policy and exploration noise:

Execute the action and observe the reward and the next state
Store the explored transition array in R
Extract sample data of the batch N from R


Update the critic-network parameters:

if t mod d then
Update the actor-network parameters through deterministic policy gradients:

Update the target network:

end if
end for
end for