Research Article
Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms
Table 1
Mujoco environment model hyperparameters.
| Order | Parameter | Value |
| 1 | Decay rate | 0.9 | 2 | Actor net learning rate | 0.0001 | 3 | Critic net learning rate | 0.0001 | 4 | Neuron number in 1st layer | 400 | 5 | Neuron number in 2nd layer | 300 | 6 | Experience pool volume | 100000 | 7 | Batch data size | 256 | 8 | Soft update coefficient | 0.01 | 9 | Action reward discount rate | 0.99 | 10 | Critic net output distribution low limit | ā20 | 11 | Target net parameters update round number | 10 |
|
|