Research Article
A Method of Multi-UAV Cooperative Task Assignment Based on Reinforcement Learning
(1) | Initialize environment | (2) | Initialize critic network and actor network | (3) | Initialize max episodes, replay buffer, batch size | (4) | for episode [1, episodes] do | (5) | Reset environment | (6) | Get current state for each agent, | (7) | for step [1, steps] do | (8) | Select actions for each agent | (9) | Get all agents next states and rewards | (10) | Store < , , , > to replay buffer D | (11) | if > then | (12) | Sample batch B from replay buffer D | (13) | for , where = 1:N do | (14) | Update the critic network | (15) | Update the actor network | (16) | Update the target network according to formulas (15), (16) | (17) | end for | (18) | end if | (19) | end for | (20) | end for |
|