Research Article
Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment
Algorithm 1
The Training Procedure of the SDRL.
Input: Expert data, initial parameters and ; | fordo | Update the discriminator by ascending the stochastic gradient; | Update the internal rewards and external rewards ; | Update the value function by ; | Update the policy of the DRL by ; | end |
|