Research Article
Two-Loop Acceleration Autopilot Design and Analysis Based on TD3 Strategy
Table 2
TD3 algorithm hyperparameters.
| Parameter | Value |
| Maximum permissible episodes | 1600 | Maximum permissible steps of each episode | 572 | Actor-network learning rate | 10-4 | Critic-network learning rate | 10-4 | Regularization constant | | Discounting factor | 0.99 | Sampling size | 256 | Variance for the initial exploration noise | 0.25 | Variance fading factor | 0.005 | Variance for random noise | 0.2 | Upper limit of the random noise | 0.25 |
|
|