Research Article
An Intelligent Offloading System Based on Multiagent Reinforcement Learning
Algorithm 2
Multiagent recurrent deep deterministic policy algorithm (MARDDPG).
| | Input: the environment | | Output: | | (1) | Initialize the parameters for the N actor network and for the critic network | | (2) | Initialized the replay memory M | | (3) | For training step = 1:all steps, do | | (4) | =initialized message, t = 0 | | (5) | While t< T, do | | (6) | For i = 1:N, do | | (7) | Select the action for agent it | | (8) | Receive reward which is calculated by regional average preference function | | (9) | Receive observation | | (10) | Update message by | | (11) | End for | | (12) | | | (13) | | | (14) | End while | | (15) | Store episode {m0, a1, r1, m1, o2, a2,...} in M | | (16) | Sample a random minibatch of episodes from replay memory M | | (17) | Each episode and each time, we do | | (18) | Update the critic, actor, and LSTM network by minimizing the loss | | (19) | Soft-update the target critic and target actor network | | (20) | End for |
|