Research Article
Resource Allocation Strategy Using Deep Reinforcement Learning in Cloud-Edge Collaborative Computing Environment
Algorithm 1
Pseudocode of DQN algorithm based on multi-objective and experience replay.
| Initialization: | | Initialize experience playback memory; | | Initialize behavior value function with random weight ; | | Initialize the target behavior value function with weight . | | Begin | (1) | For episode | (2) | do The initial observation is received and the preprocessing is taken as the start state | (3) | For | (4) | do Select behavior randomly with random probability ; | (5) | Otherwise, select behavior: ; | (6) | Execute actions in the system to obtain reward and observe at the next moment, and update to ; | (7) | Store experience to experience playback memory; | (8) | Obtain samples in random small batches from playback memory; | (9) | Calculate the target Q value of the target DQN; | (10) | Update the main DQN by minimizing the loss function ; | (11) | For network parameter , gradient descent is performed on ; | (12) | Update target network Q value. | (13) | End For | (14) | End For | (15) | End |
|