Research Article

A Method of Multi-UAV Cooperative Task Assignment Based on Reinforcement Learning

Algorithm 1

Algorithm of MA-SAC.
(1)Initialize environment
(2)Initialize critic network and actor network
(3)Initialize max episodes, replay buffer, batch size
(4)for episode [1, episodes] do
(5) Reset environment
(6) Get current state for each agent,
(7)for step [1, steps] do
(8)  Select actions for each agent
(9)  Get all agents next states and rewards
(10)  Store < , , , > to replay buffer D
(11)  if > then
(12)   Sample batch B from replay buffer D
(13)   for , where  = 1:N do
(14)   Update the critic network
(15)   Update the actor network
(16)   Update the target network according to formulas (15), (16)
(17)   end for
(18)  end if
(19)end for
(20)end for