Complexity

Research Article

Reinforcement Learning-Based Collision Avoidance Guidance Algorithm for Fixed-Wing UAVs

MACG algorithm.

(1)	Initialize ActorD and CriticD with the weights and randomly chosen.
(2)	Initialize ActorE and CriticE with the weights and .
(3)	Initialize the experience pool and a counter .
(4)	for episode = 1 to Max-Episode do
(5)	Initialize the joint action set .
(6)	Reset the environment .
(7)	Set the max time steps in an episode, .
(8)	for t = 1 to do
(9)	Generate UAVs randomly.
(10)	For each agent i, select action by ActorD and combine the actions, .
(11)	Execute and obtain .
(12)	Store in , .
(13)	Replace with .
(14)	if ( > batch_size) and (t % sample_time = 0) do
(15)	Sample a random minibatch of M from .
(16)	Use CriticD to get , use ActorE to get , and use CriticE to get .
(17)	Update CriticD by minimizing the loss in equation (6).
(18)	Update ActorD using the sampled policy gradient in equation (7).
(19)	if t% replace_time = 0 do
(20)	Update the ActorE and CriticE networks:
(21)	(19) T if number of UAVs > max_num do
(22)	(20) break
(23)	end for (t)
(24)	end for (episode)