Research Article

Multirobot Coverage Path Planning Based on Deep Q-Network in Unknown Environment

Algorithm 2

Deduction method in MCPP.
Initialization: maximum number of simulation steps , discount factor , temporary environment information , action of each robot , total rewards from action group , current reward from selected action
Choose action groups for robots
for to (k action groups) do
while the number of current step do
  for to do
   if is feasible then
    Choose a feasible next action according to the value network,
    
    Update according to selected action
   else
    
   end if
  end for
  
end while
end for
Choose an action group by