Research Article

Research on Dynamic Path Planning of Wheeled Robot Based on Deep Reinforcement Learning on the Slope Ground

Table 1

Selection method of candidate path based on TDDQN.

InputCurrent state (candidate path, root node of tree)
OutputNext state (new candidate path, child node of tree)

Step 1Initialize the threshold value and the maximum level number of the tree, and set the initial level number of the tree as 1.
Step 2According to the current state, the two actions with the highest predicted value obtained by DDQN method were selected from the global path planning action group and the local path planning action group respectively.
Step 3The state obtained after performing the global path planning action as the left node, and the state obtained after performing the local path planning action as the right node.
Step 4Add 1 to the number of levels in the tree.
Step 5If the current number of levels in the tree is less than and there are still branches that are not cut off, step 6 is performed, otherwise step 7 is performed.
Step 6If the left node’s reward value is greater than , it will be cut off, otherwise, the left node will be taken as the current state of its path and step 2 will be executed. Accordingly, if the reward value of the right node is greater than , it will be cut off, otherwise, the right node will be taken as the current state of its path and step 2 will be executed.
Step 7The nonmaximum suppression method is used to select the optimal candidate path for all tree nodes, thus TDDQN path planning strategy is formed.