|
| Parameter | Values |
|
| Episodes | 20 |
| The number of time slots in one episode | 5500 |
| State history length () | 16 |
| Experience-replay pool size | 1000 |
| Experience-replay minibatch size | 64 |
| Discount factor | 0.9 |
| Learning rate | 0.001 |
| The maximal exploration probability | 0.8 |
| The minimal exploration probability | 0.001 |
| The decay factor | 0.001 |
| Target network update frequency | 100 |
|