Research Article
Reinforcement Learning-Based Collision Avoidance Guidance Algorithm for Fixed-Wing UAVs
| (1) | Initialize ActorD and CriticD with the weights and randomly chosen. | | (2) | Initialize ActorE and CriticE with the weights and . | | (3) | Initialize the experience pool and a counter . | | (4) | for episode = 1 to Max-Episode do | | (5) | Initialize the joint action set . | | (6) | Reset the environment . | | (7) | Set the max time steps in an episode, . | | (8) | for t = 1 to do | | (9) | Generate UAVs randomly. | | (10) | For each agent i, select action by ActorD and combine the actions, . | | (11) | Execute and obtain . | | (12) | Store in , . | | (13) | Replace with . | | (14) | if ( > batch_size) and (t % sample_time = 0) do | | (15) | Sample a random minibatch of M from . | | (16) | Use CriticD to get , use ActorE to get , and use CriticE to get . | | (17) | Update CriticD by minimizing the loss in equation (6). | | (18) | Update ActorD using the sampled policy gradient in equation (7). | | (19) | if t% replace_time = 0 do | | (20) | Update the ActorE and CriticE networks: | | (21) | (19) T if number of UAVs > max_num do | | (22) | (20) break | | (23) | end for (t) | | (24) | end for (episode) |
|