| Input: UE task information . |
| Output: Offloading decision vector and resource allocation vector . |
(1) | Initialize the network parameters for agents 1 and 2. |
(2) | Set the capacity of experience buffer and specify the batch size for training. |
(3) | for episode = 1, 2, …, Max_Episode do |
(4) | Resetting the environment to obtain the initial state . |
(5) | for = 1, 2, …, do |
(6) | Input state to the actor network of agent 1 to obtain the action . |
(7) | Calculate the state input for the actor of agent 2 based on the . |
(8) | Input state to the actor of agent 2 to obtain the action . |
(9) | Calculate the reward by jointly considering the and . |
(10) | Store the (, , , ) in the replay buffer of agent 1. |
(11) | Store the (, , , ) in the replay buffer of agent 2. |
(12) | if batch size < the current capacity of buffer Then |
(13) | for agent = 1, 2 do |
(14) | Sample a batch of experiences randomly. |
(15) | Calculate the loss of critic net according to equation (15). |
(16) | Update parameters of the critic net according to equation (16). |
(17) | Calculate the loss of actor net according to the equation (17). |
(18) | Update parameter of the critic according to equation (18). |
(19) | Update parameters of the target nets according to (19) and (20). |