Research Article

Intelligent Dynamic Spectrum Allocation in MEC-Enabled Cognitive Networks: A Multiagent Reinforcement Learning Approach

Algorithm 1

The proposed QMIX-based DSA algorithm.
1: Initialization:
  The network environment and experience replay buffer ; the parameters for hypernetwork and all of the agent networks ;
2: Setting:
  The target-network parameters , the learning rate , the discount factor , the batch size , maximum training epoch, episode, slot: , maximum train step ;
3: [Centralized Training Phase]:
4: whiledo
5:  fordo
6:   fordo
7:    for each agent do
8:    Get observation , action , reward ;
9:    end for
10:    Get the next observation ;
11:    Store the to the observation-action history;
12:    end for
13:    Store the episode data to the replay buffer ;
14:   end for
15:   for in each epoch do
16:    Sample a batch of episodes’ experience from ;
17:   for each slot in each sampled episode do
18:     Get and from the evaluate-network and the target-network, respectively;
19:   end for
20:   Calculate the loss function by (14), and update the evaluate-network parameters ;
21:   Update the target-network parameters ;
22:  end for
23:  Save DRQN and QMIX network models;
24: end while
25: [Decentralized Executing Phase]:
26: Setting: ;
27: Input: The channel state;
28: Output: The agents’ observations and actions.