Research Article

Multirobot Coverage Path Planning Based on Deep Q-Network in Unknown Environment

Algorithm 1

Deduction method in CPP modified by MCTS.
Initialization: maximum number of simulation steps
, action score , discount factor
for to 4 (four directions) do
if this direction is feasible then
  while the number of current step do
   Choose a feasible next action according to the value network
   
   Update according to selected action
  end while
  Calculate according to and the rewards for each step and the value network
else
  
end if
end for