Research Article

Deep Reinforcement Learning for UAV Intelligent Mission Planning

Table 2

Algorithm: PPO (CLIP).

Proximal policy optimization algorithm (PPO)

1. For i = 1 to N do
2. Run policy for T timesteps, collecting ;
3. Estimate return and advantage ;
4. For k = 1 to K do
5. Sample minibatch from the trajectory, calculate policy loss and value loss;
6. Optimize surrogate loss function ;
7. Update policy parameter ;
8. End for
9. End for