Multirobot Coverage Path Planning Based on Deep Q-Network in Unknown Environment
Algorithm 2
Deduction method in MCPP.
Initialization: maximum number of simulation steps , discount factor , temporary environment information , action of each robot , total rewards from action group , current reward from selected action
Choose action groups for robots
for to (k action groups) do
while the number of current step do
for to do
if is feasible then
Choose a feasible next action according to the value network,