Research Article

Research on Dynamic Path Planning of Wheeled Robot Based on Deep Reinforcement Learning on the Slope Ground

Figure 8

Training curve of target network loss function. Each point is the average loss function value achieved per ten epochs. The y-axis denotes the value of loss function and x-axis denotes iteration epoch. (a) Initial stage of training. (b) Convergence stage of training.
(a)
(b)