Wireless Communications and Mobile Computing

Research Article

Deep Reinforcement Learning for Performance-Aware Adaptive Resource Allocation in Mobile Edge Computing

: PARA scheme.

BEGIN
01: Initialize the learned networks and with weights and ;
02: Initialize target networks and with weights , ;
03: Initialize the experience replay memory with size , the minibatch with size ;
04: Initialize the task arrival rate , the data buffer’s size , the font-end queue , the back-end queue , and the transmission rate between the edge server and the mobile users.
05: for curr_episode = 1, MAX_EPISODES do
06: Initialize a noise object to exploration actor;
07: Reset simulation parameters for performance-aware resource allocation environment, and observe an initial system state ;
08: Initial the long-term reward ;
09: for each time slot , MAX_EP_STEPS do
10: Select an action based on the actor network and exploration noise;
11: Based on the action , the edge server allocates resource to execute workloads and transmits the computation results, and the immediate reward R can be calculated, and then the next state can be observed;
12: The experience transition () is stored into the replay memory;
13: Update the long-term reward and system state ;
14: A minibatch of transition experiences is randomly sampled from replay memory ;
15: Compute the target action value and the target value for the next state ;
16: Update the target value for the current state ;
17: Update the critic network by minimizing the loss :

18: Update the actor network by using the sampled policy gradient:

19: Update the target network:


15: end for
16: end for
END