Research Article
Research on Dynamic Path Planning of Wheeled Robot Based on Deep Reinforcement Learning on the Slope Ground
Figure 8
Training curve of target network loss function. Each point is the average loss function value achieved per ten epochs. The y-axis denotes the value of loss function and x-axis denotes iteration epoch. (a) Initial stage of training. (b) Convergence stage of training.
(a) |
(b) |