Research Article
Reinforcement Learning-Based Collision Avoidance Guidance Algorithm for Fixed-Wing UAVs
(1) | Initialize ActorD and CriticD with the weights and randomly chosen. | (2) | Initialize ActorE and CriticE with the weights and . | (3) | Initialize the experience pool and a counter . | (4) | for episode = 1 to Max-Episode do | (5) | Initialize the joint action set . | (6) | Reset the environment . | (7) | Set the max time steps in an episode, . | (8) | for t = 1 to do | (9) | Generate UAVs randomly. | (10) | For each agent i, select action by ActorD and combine the actions, . | (11) | Execute and obtain . | (12) | Store in , . | (13) | Replace with . | (14) | if ( > batch_size) and (t % sample_time = 0) do | (15) | Sample a random minibatch of M from . | (16) | Use CriticD to get , use ActorE to get , and use CriticE to get . | (17) | Update CriticD by minimizing the loss in equation (6). | (18) | Update ActorD using the sampled policy gradient in equation (7). | (19) | if t% replace_time = 0 do | (20) | Update the ActorE and CriticE networks: | (21) | (19) T if number of UAVs > max_num do | (22) | (20) break | (23) | end for (t) | (24) | end for (episode) |
|