Research Article
[Retracted] Feudal Multiagent Reinforcement Learning for Interdomain Collaborative Routing Optimization
Algorithm 1
Feudal Multiagent Actor-Critic for collaborative routing.
Input: Initialize inter-domain environments with agents contained managers and workers | Output: All managers’ and workers’ routing policies | 1: for each episode to do | 2: Initialize a random process for routing actions exploration, get the initial state | 3: for each time-step to do | 4: Managers select an action under the current policy and exploration | 5: Workers select and execute an action | 6: Receive the reward , and observe the next newly state | 7: Store replay buffer and | 8: for each agent to do | 9: Sample a random minibatch from and | 10: Update actor by using policy gradient: | 11: | 12: Update critic by minimizing the loss: | 13: | 14: end for | 15: Update target network parameters: | 16: | 17: end for | 18: end for |
|