Research Article
Deep Reinforcement Learning for UAV Intelligent Mission Planning
| | Proximal policy optimization algorithm (PPO) |
| | 1. For i = 1 to N do | | 2. Run policy for T timesteps, collecting ; | | 3. Estimate return and advantage ; | | 4. For k = 1 to K do | | 5. Sample minibatch from the trajectory, calculate policy loss and value loss; | | 6. Optimize surrogate loss function ; | | 7. Update policy parameter ; | | 8. End for | | 9. End for |
|
|