Research Article
Deep Reinforcement Learning for UAV Intelligent Mission Planning
| Proximal policy optimization algorithm (PPO) |
| 1. For i = 1 to N do | 2. Run policy for T timesteps, collecting ; | 3. Estimate return and advantage ; | 4. For k = 1 to K do | 5. Sample minibatch from the trajectory, calculate policy loss and value loss; | 6. Optimize surrogate loss function ; | 7. Update policy parameter ; | 8. End for | 9. End for |
|
|