Research Article
Learning Attentional Communication with a Common Network for Multiagent Reinforcement Learning
Algorithm 1
Multiagent attentional communication with the common network.
(1) | Initialize , , and common network | (2) | Initialize experience replay and variable | (3) | Initialize | (4) | for do | (5) | for do | (6) | Choose an action according to the greedy policy or new policy | (7) | Perform joint actions on the environment and then get a collective reward | (8) | Store samples | (9) | end for | (10) | Calculate the average score per episode during the test. | (11) | Replace variable when variable is greater than and then update the common network with | (12) | Put samples collected throughout the episode into experience replay | (13) | for do | (14) | Batch sampling and calculate consensus information by formula (8) | (15) | Obtain of each agent, after the communication module | (16) | Calculate the target value | (17) | Calculate the loss by formula (6) | (18) | end for | (19) | Update the estimate network with a gradient descent step | (20) | Replace the target network with , every epochs | (21) | end for |
|