Research Article

Two-Loop Acceleration Autopilot Design and Analysis Based on TD3 Strategy

Table 2

TD3 algorithm hyperparameters.

ParameterValue

Maximum permissible episodes1600
Maximum permissible steps of each episode572
Actor-network learning rate10-4
Critic-network learning rate10-4
Regularization constant
Discounting factor 0.99
Sampling size 256
Variance for the initial exploration noise 0.25
Variance fading factor 0.005
Variance for random noise 0.2
Upper limit of the random noise 0.25