Intelligent Dynamic Spectrum Allocation in MEC-Enabled Cognitive Networks: A Multiagent Reinforcement Learning Approach
Algorithm 1
The proposed QMIX-based DSA algorithm.
1: Initialization:
The network environment and experience replay buffer ; the parameters for hypernetwork and all of the agent networks ;
2: Setting:
The target-network parameters , the learning rate , the discount factor , the batch size , maximum training epoch, episode, slot: , maximum train step ;
3: [Centralized Training Phase]:
4: whiledo
5: fordo
6: fordo
7: for each agent do
8: Get observation , action , reward ;
9: end for
10: Get the next observation ;
11: Store the to the observation-action history;
12: end for
13: Store the episode data to the replay buffer ;
14: end for
15: for in each epoch do
16: Sample a batch of episodes’ experience from ;
17: for each slot in each sampled episode do
18: Get and from the evaluate-network and the target-network, respectively;
19: end for
20: Calculate the loss function by (14), and update the evaluate-network parameters ;