Research Article

Resource Allocation Strategy Using Deep Reinforcement Learning in Cloud-Edge Collaborative Computing Environment

Algorithm 1

Pseudocode of DQN algorithm based on multi-objective and experience replay.
Initialization:
Initialize experience playback memory;
Initialize behavior value function with random weight ;
Initialize the target behavior value function with weight .
Begin
(1)For episode
(2)  do The initial observation is received and the preprocessing is taken as the start state
(3)  For
(4)   do Select behavior randomly with random probability ;
(5)   Otherwise, select behavior: ;
(6)   Execute actions in the system to obtain reward and observe at the next moment, and update to ;
(7)   Store experience to experience playback memory;
(8)   Obtain samples in random small batches from playback memory;
(9)   Calculate the target Q value of the target DQN;
(10)   Update the main DQN by minimizing the loss function ;
(11)   For network parameter , gradient descent is performed on ;
(12)   Update target network Q value.
(13)  End For
(14)End For
(15)End