Research Article

Reinforcement Learning-Based Collision Avoidance Guidance Algorithm for Fixed-Wing UAVs

Algorithm 1

MACG algorithm.
(1)Initialize ActorD and CriticD with the weights and randomly chosen.
(2)Initialize ActorE and CriticE with the weights and .
(3)Initialize the experience pool and a counter .
(4)for episode = 1 to Max-Episode do
(5) Initialize the joint action set .
(6) Reset the environment .
(7) Set the max time steps in an episode, .
(8)for t = 1 to do
(9)  Generate UAVs randomly.
(10)  For each agent i, select action by ActorD and combine the actions, .
(11)  Execute and obtain .
(12)  Store in , .
(13)  Replace with .
(14)  if ( > batch_size) and (t % sample_time = 0) do
(15)   Sample a random minibatch of M from .
(16)   Use CriticD to get , use ActorE to get , and use CriticE to get .
(17)   Update CriticD by minimizing the loss in equation (6).
(18)   Update ActorD using the sampled policy gradient in equation (7).
(19)  if t% replace_time = 0 do
(20)   Update the ActorE and CriticE networks:
(21) (19) T if number of UAVs > max_num do
(22)  (20) break
(23)end for (t)
(24)end for (episode)