Research Article

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Table 1

Mujoco environment model hyperparameters.

OrderParameterValue

1Decay rate0.9
2Actor net learning rate0.0001
3Critic net learning rate0.0001
4Neuron number in 1st layer400
5Neuron number in 2nd layer300
6Experience pool volume100000
7Batch data size256
8Soft update coefficient0.01
9Action reward discount rate0.99
10Critic net output distribution low limitāˆ’20
11Target net parameters update round number10