Research Article

[Retracted] Feudal Multiagent Reinforcement Learning for Interdomain Collaborative Routing Optimization

Algorithm 1

Feudal Multiagent Actor-Critic for collaborative routing.
Input: Initialize inter-domain environments with agents contained managers and workers
Output: All managers’ and workers’ routing policies
1: for each episode to do
2: Initialize a random process for routing actions exploration, get the initial state
3: for each time-step to do
4:  Managers select an action under the current policy and exploration
5:  Workers select and execute an action
6:  Receive the reward , and observe the next newly state
7:  Store replay buffer and
8:  for each agent to do
9:   Sample a random minibatch from and
10:   Update actor by using policy gradient:
11:     
12:   Update critic by minimizing the loss:
13:     
14:   end for
15:  Update target network parameters:
16:   
17: end for
18: end for