Research on Dynamic Path Planning of Wheeled Robot Based on Deep Reinforcement Learning on the Slope Ground
Table 1
Selection method of candidate path based on TDDQN.
Input
Current state (candidate path, root node of tree)
Output
Next state (new candidate path, child node of tree)
Step 1
Initialize the threshold value and the maximum level number of the tree, and set the initial level number of the tree as 1.
Step 2
According to the current state, the two actions with the highest predicted value obtained by DDQN method were selected from the global path planning action group and the local path planning action group respectively.
Step 3
The state obtained after performing the global path planning action as the left node, and the state obtained after performing the local path planning action as the right node.
Step 4
Add 1 to the number of levels in the tree.
Step 5
If the current number of levels in the tree is less than and there are still branches that are not cut off, step 6 is performed, otherwise step 7 is performed.
Step 6
If the left node’s reward value is greater than , it will be cut off, otherwise, the left node will be taken as the current state of its path and step 2 will be executed. Accordingly, if the reward value of the right node is greater than , it will be cut off, otherwise, the right node will be taken as the current state of its path and step 2 will be executed.
Step 7
The nonmaximum suppression method is used to select the optimal candidate path for all tree nodes, thus TDDQN path planning strategy is formed.