Research Article
Adaptive Optimization of Traffic Signal Timing via Deep Reinforcement Learning
Table 3
Simulation environment hyperparameters.
| Parameter | Meaning | Value |
| | Discount factor | 0.99 | | Learning rate | 0.001 | | Clip range | 0.2 | | Every episode simulation time | 5000 s | | The number of steps for update | 128 | | Entropy coefficient for the loss calculator | 0.01 | | Value function coefficient for the loss function | 0.5 |
|
|