Research Article
Robot Navigation in Crowd Based on Dual Social Attention Deep Reinforcement Learning
| (1) | Initialize experience replay memory with demonstration | | (2) | Initialize value network with memory | | (3) | Initialize target value network | | (4) | fordo | | (5) | Initialize joint state | | (6) | repeat | | (7) | | | (8) | | | (9) | | | (10) | Enrich experience | | (11) | Optimize value network with experience | | (12) | Update value network by gradient descent | | (13) | until terminal state or | | (14) | Update target network | | (15) | end for | | (16) | return |
|