Research Article
An Exoatmospheric Homing Guidance Law Based on Deep Q Network
Table 3
DQN algorithm hyperparameter.
| Hyperparameter | Parameter value |
| Maximum iterations | 3000 | Discount factor | 0.996 | Q network learning rate | 0.001 | Capacity of experience replay memory | 100000 | Minibatch size | 64 | Target network update rate | 10 | Initial exploration | 0.8 | Final exploration | 0.01 | Reward coefficient | 0.05 |
|
|