Research Article
Multirobot Coverage Path Planning Based on Deep Q-Network in Unknown Environment
Table 2
The list of parameters of training.
| Parameters | Values |
| Replay memory size | 100000 | Discount factor | 0.99 | The initial value of the greedy exploration | 1 | The final value of the greedy exploration | 0.25 | Robot's maximum number of steps in each episode | 180 | Learning rate | 0.0005 | Target network update frequency | 50 (episodes) |
|
|