Research Article

Multirobot Coverage Path Planning Based on Deep Q-Network in Unknown Environment

Table 2

The list of parameters of training.

ParametersValues

Replay memory size100000
Discount factor0.99
The initial value of the greedy exploration1
The final value of the greedy exploration0.25
Robot's maximum number of steps in each episode180
Learning rate0.0005
Target network update frequency50 (episodes)