Research Article
Multirobot Coverage Path Planning Based on Deep Q-Network in Unknown Environment
Algorithm 1
Deduction method in CPP modified by MCTS.
| Initialization: maximum number of simulation steps | | , action score , discount factor | | for to 4 (four directions) do | | if this direction is feasible then | | while the number of current step do | | Choose a feasible next action according to the value network | | | | Update according to selected action | | end while | | Calculate according to and the rewards for each step and the value network | | else | | | | end if | | end for |
|