Mobile Information Systems

Research Article

Resource Allocation Strategy Using Deep Reinforcement Learning in Cloud-Edge Collaborative Computing Environment

Pseudocode of DQN algorithm based on multi-objective and experience replay.

	Initialization:
	Initialize experience playback memory;
	Initialize behavior value function with random weight ;
	Initialize the target behavior value function with weight .
	Begin
(1)	For episode
(2)	do The initial observation is received and the preprocessing is taken as the start state
(3)	For
(4)	do Select behavior randomly with random probability ;
(5)	Otherwise, select behavior: ;
(6)	Execute actions in the system to obtain reward and observe at the next moment, and update to ;
(7)	Store experience to experience playback memory;
(8)	Obtain samples in random small batches from playback memory;
(9)	Calculate the target Q value of the target DQN;
(10)	Update the main DQN by minimizing the loss function ;
(11)	For network parameter , gradient descent is performed on ;
(12)	Update target network Q value.
(13)	End For
(14)	End For
(15)	End