Journal of Robotics

Research Article

Multirobot Coverage Path Planning Based on Deep Q-Network in Unknown Environment

Deduction method in CPP modified by MCTS.

	Initialization: maximum number of simulation steps
	, action score , discount factor
	for to 4 (four directions) do
	if this direction is feasible then
	while the number of current step do
	Choose a feasible next action according to the value network

	Update according to selected action
	end while
	Calculate according to and the rewards for each step and the value network
	else

	end if
	end for