Abstract

In mobile edge computing, there are usually relevant dependencies between different tasks, and traditional algorithms are inefficient in solving dependent task-offloading problems and neglect the impact of the dynamic change of the channel on the offloading strategy. To solve the offloading problem of dependent tasks in a dynamic network environment, this paper establishes the dependent task model as a directed acyclic graph. A Dependent Task-Offloading Strategy (DTOS) based on deep reinforcement learning is proposed with minimizing the weighted sum of delay and energy consumption of network services as the optimization objective. DTOS transforms the dependent task offloading into an optimal policy problem under Markov decision processes. Multiple parallel deep neural networks (DNNs) are used to generate offloading decisions, cache the optimal decisions for each round, and then optimize the DNN parameters using priority experience replay mechanism to extract valuable experiences. DTOS introduces a penalty mechanism to obtain the optimal task-offloading decisions, which is triggered if the service energy consumption or service delay exceeds the threshold. The experimental results show that the algorithm produces better offloading decisions than existing algorithms, can effectively reduce the delay and energy consumption of network services, and can self-adapt to the changing network environment.

1. Introduction

With the advent of the Internet era, smart devices are widely used in our lives. Due to the limited computing power of mobile devices, they sometimes cannot satisfy users’ needs. In addition, mobile devices processing massive calculation tasks can lead to excessive energy consumption, which results in bad users’ experience [1]. To solve these problems, mobile cloud computing has been brought up. Massive calculation tasks can be offloaded to the cloud by mobile devices, where the cloud server performs computing and returns the results to the terminal [2]. Because the cloud server is far from the terminal, the task transmission delay is high. At the same time, increasingly, terminal devices upload data to the cloud server, which also brings high pressure to the network. Therefore, mobile edge computing (MEC) came into being [3]. MEC offloads computing tasks from end devices to edge servers that are closer to them for computing to reduce network pressure, data transmission delay, and end device energy consumption [4, 5]. How to make offloading decisions is crucial, and the effectiveness of offloading decisions depends on key indicators, such as energy consumption and delay [6]. There are 2 forms of offloading, partial offloading and fully offloading [7].

In recent years, there have been many research results for the offloading of computing tasks for MEC. Fu and Ye [8] described the offloading problem as a delay minimization problem under energy consumption constraints. An improved firefly swarm optimization algorithm was proposed to generate the offloading decision, which significantly reduced the system cost. Zhu and Wen [9] defined the weighted sum of energy consumption and delay as an optimization function of total overhead. An offloading strategy based on an improved genetic algorithm was proposed, which achieved better results on delay and load balance, but not on energy consumption reduction. Wei et al. [10] proposed a maximum energy-saving priority algorithm to reduce the energy consumption, which used greedy selection to solve the optimization problem, but only considered energy consumption and was not suitable for delay sensitive scenarios. Yang et al. [11] and Guo and Liu [12] proposed game theory-based offloading algorithm that described the overhead minimization problem as a policy game to reduce the energy consumption and delay of each mobile device through joint optimization.

With the development of machine learning, great progress has been made in using machine-learning algorithms to solve the computing offloading problem [13]. Liang et al. [14] proposed a Distributed Deep Learning-based Offloading (DDLO) algorithm that used multiple parallel deep neural networks to generate offloading decisions. The network parameters were continuously updated using an experience replay mechanism. The algorithm generates near-optimal offloading decisions in a short time, but the authors only considered the static network scenario. Li et al. [15] proposed a deep reinforcement learning algorithm for the complex computing offloading problem in collaborative computing with heterogeneous edge computing servers. The algorithm optimized the offloading decision based on the real-time state of the network and the properties of the task to minimize the task delay, but the authors only considered the delay, not energy consumption. Zhou et al. [16] used deep reinforcement learning to study the joint optimization problem of computing offloading and resource allocation in dynamic multiuser MEC systems, using the DDQN (Double Deep Q Network) algorithm to dynamically generate offloading decisions. In reference [17], a DDQN-based trajectory and phase shift optimization method was proposed to maximize RIS-UAV network capacity. In reference [18], a new incentive-driven and deep Q network-based method (IDQNM) was proposed for designing mobile node incentive mechanisms and content-caching strategies in D2D offloading. The reference [19] proposed an incentive mechanism based on delay constraints and reverse auction. With the maximization of mobile network operators’ revenue as the optimization objective, two optimization methods were proposed: the greedy winner selection method (GWSM) and the dynamic planning winner selection method (DPWSM). T. Yang and J. Yang [20] proposed a joint optimization method for offloading decisions and resource allocation, which improved the DQN algorithm, shortened the finish time of computing tasks, and reduced the terminal energy consumption. Zhu et al. [21] proposed a dynamic resource allocation strategy based on K-means, where resources were modeled as “fluids” and allocated using an auction algorithm. The throughput of the edge server was improved and the transmission delay was reduced. Above algorithms have not taken task dependency into account, while task dependency is truly existed in practical applications.

The representative research results on the offloading strategies of dependent tasks are as follows. Dong et al. [22] proposed a computational-offloading strategy based on genetic algorithms. This strategy encoded the offloading location and offloading order of tasks, used delay and energy consumption as evaluation criteria, and continuously optimized the offloading decision by variation and crossover operations. However, it did not consider the resource allocation of edge servers. The fine-grained offloading problem with multiple users and multiple servers was studied in [23]. The authors considered the fine-grained offloading of Internet of Things (IoT) devices as a multiconstrained objective optimization problem and proposed an improved Nondominated Sorting Genetic Algorithm (NSGA-II) with the objective of minimizing the average delay. Mao et al. [24] proposed a delay-acceptance-based offloading strategy for multiuser tasks. The strategy firstly used a nondominated genetic algorithm to solve the optimal solution in a single-user scenario for each user, then improved the convergence speed with a probabilistic selection mechanism and nondominated judging scheme, and finally proposed an adjustment strategy based on the idea of time delay acceptance in stable matching. It solved the multiuser-offloading problem with dependent task scenarios. Liu et al. [25] proposed an energy-efficient collaborative task-offloading algorithm based on semidefinite relaxation and stochastic mapping, which generated offloading decisions for dependent tasks in static network environments and reduced the total energy consumption of IoT devices.

Above literatures only consider the offloading of dependent tasks in static network environments, but in fact the network environment is dynamic. In this paper, we use deep reinforcement learning methods to generate dependent task-offloading decisions, cache the optimal decisions for each round, and then optimize DNN parameters using priority experience replay mechanism to extract valuable experiences. DTOS introduces a penalty mechanism, which is triggered if the service energy consumption or service delay exceeds the threshold. The algorithm can reduce delay and energy consumption. When the network transmission rate changes, the model can generate the corresponding optimal-offloading decision and adapt to the changing network environment by only obtaining the current network rate in real time. The main contributions of this paper are as follows: (1)A dependent task-offloading model is built for the scenario of a dynamic network environment. The optimization objective of DTOS is derived by comprehensively considering the delay and energy consumption of the service(2)Transform the dependent task offloading into an optimal policy problem under Markov decision processes. A deep reinforcement learning-based dependent task-offloading strategy is proposed, which can obtain optimal task-offloading decisions using priority experience replay mechanism and penalty mechanism(3)The effectiveness of DTOS is verified through simulation experiments. According to the simulated experiment results, DTOS is effective and better than the other four algorithms

2. DTOS Model

2.1. System Model

The model in this paper is built on the scenario of multiple IoT devices and a single-edge server, the system model is shown in Figure 1. IoT devices can be smart devices, wireless sensors, and other network-connected devices. Unmanned Aerial Vehicle (UAV) is adopted as the edge server. Suppose that we need to execute an IoT service , and that the service needs to be computed collaboratively by IoT devices. Each IoT device has a certain amount of computing power that can execute computing tasks locally. Each IoT device connects to the edge server through a wireless network, and the wireless transmission rate from each IoT device to the edge server varies and is unstable because the UAV is moving at a low speed in the area. The fine-grained computational tasks of the IoT service are equally distributed in different IoT devices. There are data dependencies between different computational tasks and these tasks are indivisible, each computational task is either computed locally or all uploaded to the edge server for computation, i.e., binary offloading. The offloading decision at time is represented as a list with length . determines whether the task should be offloaded to the edge server for computation. When , it means that the task is executed locally. When , the task will be offloaded to the edge server for execution. In this paper, the optimization goal is to minimize the delay and energy consumption to generate the offloading decision to determine whether the task is executed at the local.

2.2. Communication Model

The wireless transmission rate from each IoT device to the edge server at time is denoted as , where denotes the wireless transmission rate from the IoT device to the edge server, and the wireless transmission rates of all devices are different from each other. Assume that the flight altitude of the UAV is constant , its position at time is denoted as . The position of device is denoted as , thus the distance between the UAV and the IoT device at time is denoted as

It is assumed that the UAV and IoT devices are modulated by Orthogonal Frequency Division Multiplexing (OFDM) and accessed by Time Division Multiple Access (TDMA). The wireless channels between UAV and IoT devices are line-of-sight channels. Therefore, the channel gain between the UAV and the IoT device at time can be expressed as follows: where denotes the channel power gain at 1 m.

According to the Shannon formula, is calculated as follows: where is the channel bandwidth of device ; is the transmission power of device ; and denotes the variance of additive Gaussian white noise.

If the task is offloaded to the edge server for calculation, the transmission delay is calculated as follows: where denotes the data size of task . After the task is finished computing at the edge server, the result is transmitted back to the IoT device.

The energy consumption of device during the task transmission to the edge server is calculated as follows:

2.3. Computing Model
2.3.1. Local-Computing Model

If is defined as the computing resource of device i, the local computational delay of task is as follows: where is the computation complexity parameter of the task.

Defining as the local-computing power, the local-computing energy consumption of device is as follows:

2.3.2. Edge-Computing Model

Define as the total computing resources of the edge server and as the computing resources allocated to task by the edge server, so the edge-computing delay of task is as follows:

When the task is offloaded to the edge server for calculation, the device is in an idle state. During this period, the energy consumption generated by device is as follows: where is the power of device when it is idle.

2.4. Task Dependency Model

The task dependency is considered as input dependency in this paper, i.e., the execution of some tasks requires the execution results of other tasks as input. The task dependency can be modeled as a directed acyclic graph, where each node represents a computational task. A single task can have multiple precursor and successor tasks. If the task has no successor, it is the final task, and the entire service is executed when its execution is finished.

Each successor task needs to wait all its precursor tasks to end, which are in parallel, so the execution time of successor task equals the maximum of the finish time of all precursor tasks. Denote the time when all tasks start to execute as , and the beginning node executes at 0. Denote the finish time of all tasks as , then the finish time of task equals the following formula: where is the execution delay of task . It is calculated as follows:

The start execution time of task is as follows: where is the set of direct predecessor tasks of task . If is the empty set, the task is the starting task and its start execution time is 0.

2.5. Problem Description

The optimization objective of the offloading strategy is to minimize the weighted sum of the delay and energy consumption of the service. From the above model, the finish time of the entire service can be deduced as follows:

The energy consumption generated by task is as follows:

The sum of the energy consumption generated by all tasks is as follows:

The algorithm introduces a penalty mechanism, which is triggered if the service energy consumption or service delay exceeds the threshold. The weighted sum of the energy consumption and delay generated by completing the service is as follows: where is the weight, is the delay threshold, is the energy consumption threshold, and and are the penalty factors for delay and energy consumption, respectively. When the penalty mechanism is not triggered, and are 0.

The optimization objective is to minimize the value, and the formula is:

Constraint indicates that each task can only choose to be computed locally or offloaded to an edge server for computing. Constraint indicates that a task starts execution at the finish time of all its predecessor tasks. Constraint indicates that the execution time of the starting task is 0. Constraint indicates weight is a number between 0 and 1.

3. DTOS Algorithm

3.1. Model Training Process of DTOS

Offloading optimization of dependent tasks in edge computing is an NP-hard problem [25], and reinforcement learning can continuously interact with the environment and can transform the offloading optimization problem of dependent tasks into an optimal policy problem in Markov decision-making. The model training process of DTOS is illustrated in Figure 2, and the main steps are as follows.

3.1.1. Forward Propagation

Firstly, the experimental parameters are initialized and DNNs are constructed with randomly generated weight parameters and bias parameters. The number of neurons in the input and output layers of the DNN is the number of IoT devices , and the number of neurons in the 1st hidden layer and the 2nd hidden layer is and , respectively.

The wireless transmission rate from each IoT device to the edge server at time is used as the input layer data of the DNN and is input to the hidden layer for calculation, and all hidden layer neurons use ReLu as the activation function. The output layer neurons do not need an activation function, and different offloading decisions are directly after the data are calculated with the weight parameters and bias parameters.

Each DNN outputs different offloading decisions after one forward propagation is completed. The -th DNN outputs the offloading decision as . Since , but the output of DNN is not always 0 or 1, the output of DNN needs to be transformed. The method is to convert each value greater than 0 in the array to 1, otherwise to 0, resulting in offloading decisions.

3.1.2. Computational Resource Allocation

In existing studies, the computational resources of the edge server are usually distributed equally to all tasks offloaded to the edge server for execution, which will waste computing resources and increase the delay of edge computing in IoT services with dependent tasks.

Computational resource allocation is a continuous control problem, so in this paper, the edge server is used as an agent, and DDPG (Deep Deterministic Policy Gradient) is used to allocate computational resources of the edge server. Reinforcement learning has three key elements: state, action, and reward. The state consists of two components

denotes the network transmission rate of the device that needs to offload the task to the edge server, and the vector O is assigned values based on each offloading decision generated by DNN and the wireless transmission rate from the device to the edge server. If the task needs to be offloaded to the edge server for computation, the wireless transmission rate from device to the edge server will be assigned to ; otherwise, is set to zero, indicating that the task is computed locally. denotes the remaining computational resources of the edge server.

The action (Act) is the number of computational resources allocated by the edge server to the task of each device

At the start of task execution, DDPG selects a value from the remaining computational resources based on the current state as computational resources amount allocated to the task by the edge server.

The reward is determined by the current state and the current action with the following formula:

The reward function is associated with the objective function, and the optimization objective of resource allocation is to minimize the total delay of edge computing, which is calculated as follows:

The reward function is as follows:

The negative of the edge-computing delay of task is used as the reward value, which can minimize the edge-computing delay. Each state-action group has a value , which represents the expectation of the long-term reward obtained by performing the action under . For the state-action group, its value is calculated and stored in the table. The value is updated by the following formula: where is called the action value function, is the system state at moment , is the learning rate, and is the discount factor. We designs two deep neural networks, the action value network and the action network , where and are network parameters. The action network is a mapping of the state space and action space and can directly generate the desired action according to the state. The action value network is used to approximate the action value function and can provide gradient for the training of the action network. The training of this action value network is to minimize the following loss function: where is the target value network. synchronizes the weights from the network. The action network parameters are updated by the policy gradient algorithm, and the gradient update is as follows:

After inputting the into the action network, the action network generates the action required for the current , thus minimizing the computational delay of the edge server.

3.1.3. Generate Optimal Offloading Decisions

Each offloading decision and the wireless transmission rate are substituted into the model to obtain the cost value under each offloading decision. The offloading decision with the smallest cost value is selected as the output of this round, and the offloading decision and the device transmission rate are stored in memory as one data entry. Set a counter, and increase 1 when one data entry is stored.

3.1.4. Backward Propagation

Different from traditional supervised learning, the offloading decision in the dynamic network environment does not have a dataset with labels for neural network training. Thus, DTOS uses the data in memory to train DNNs through experience replay mechanisms. When the counter is higher than the set threshold β, the algorithm starts to perform backward propagation. In this paper, we use priority experience replay mechanism to extract valuable experiences. The lower the cost value of the sample, the higher the priority of the sample. The experience extraction is performed in a probabilistic way. The probability of each sample being selected is as follows: where is the number of samples in the current memory. The data are selected in the memory cache as training data. Input the of them again to each DNN to generate the offloading decision and calculate the cost function according to the following formula: where is the cross-entropy loss function, which is formulated as follows:

To minimize the cross-entropy loss, a gradient descent algorithm is used to optimize the network parameters of each DNN by performing one backpropagation after every 10 forward propagations. When the memory capacity is full, the oldest data entry will be discarded to store a new one. The quality of the training data will become higher and higher as the DNN parameters are continuously updated, and the deep reinforcement learning network also gets closer to the objective function. After reaching the set number of training rounds , the algorithm terminates.

3.2. Usage of DTOS

After the model is trained, the edge server detects the wireless transmission rate of each device at the start of using DTOS and inputs them into the deep reinforcement learning network to generate the optimal-offloading decision.

Using the offloading coordinator in the edge server as the management module for computing offloading, the service is executed as follows: (1)When an IoT service is started, the service number is transmitted to the edge server via remote command. The edge server loads the corresponding network model trained in advance according to the service number. The offloading coordinator detects the current network state, obtains the network transmission rate of each IoT device, and inputs it into the network model(2)Execute DTOS, a deep reinforcement learning-based dependent task-offloading policy, to obtain the optimal-offloading decision(3)Transmit the offloading decision to the corresponding IoT device(4)The IoT device that receives the offloading decision determines whether the task is to be processed locally or uploaded to the server for processing based on the offloading decision. If it is uploaded to the server for processing, the edge server dynamically allocates computing resources(5)The edge server finishes the calculation and transmits the result back to the IoT device

3.3. Computational Complexity Analysis of DTOS

The wireless transmission rate of IoT devices is input to the neural network, and the number of neurons in the input layer is since the number of IoT devices is . The number of neurons in the first hidden layer of the network is , then the matrix operation from the input layer to the first hidden layer is , and the computational complexity of the matrix is . The number of neurons in the second hidden layer of the network is , then the matrix operation from the first hidden layer to the second hidden layer is , and the computational complexity of the matrix is . The number of neurons in the output layer is . The matrix operation from the second hidden layer to the output layer is , and the computational complexity of the matrix is . The number of DNNs is , thus the computational complexity of DTOS forward propagation is .

During the model training process of DTOS, the gradient descent algorithm is used to update the network parameters several times until the algorithm converges.

The computational complexity of each parameter update is equal to the computational complexity of forward propagation, so the computational complexity of DTOS is , and is the size of batch training.

4. Experiments and Analysis

4.1. Experimental Parameters Set

Simulation experiments were performed using Python 3.8 and TensorFlow 2.2.0. Setting up the scenario where there exists 1 edge server with 10 IoT devices, there are three dependencies among the computing tasks, as shown in Figure 3.

Experimental parameter settings are shown in Table 1.

4.2. Effect of Weighting Factor α and Learning Rate γ

Figure 3(c) is used to investigate the effects of the weighting factor α and learning rate γ. In Figure 4, we show the effects of different weighting factors α on the service delay and energy consumption. When the weighting factor α is small, the DTOS optimization objective focuses more on the service energy consumption. The smaller the service power consumption is, the larger the reward value is. As the weighting factor α increases, the service delay will decrease, while the service energy consumption will increase. For different IoT services, users can adjust the weight factor α according to the delay and energy consumption requirements. To balance the effects of delay and energy consumption on cost values, the following experiments set the weighting factor α to 0.5.

For choosing the appropriate learning rate , the optimal-offloading decision generated by the exhaustive method is introduced as a comparison benchmark. The exhaustive method selects the optimal solution by enumerating all offloading decisions. Figure 5 shows the ratio of the cost values of DTOS and the exhaustive method, i.e., the gain ratio. As indicated in Figure 5, DTOS performs best and the gain ratio increases gradually when the learning rate is 0.01. The gain ratio of both offloading strategies approaches 1.0 after 3000 rounds, which indicates that the offloading decision generated by DTOS is close to the optimal solution of the exhaustive method. When the learning rate is 0.1, the algorithm cannot converge. Because the learning rate is too large and the parameter update ranges are too large, which causes the network to fail to converge to the optimal solution. When the learning rate is 0.001, because the learning rate is too small and the parameter update range is too small, the network cannot converge to the optimal solution in a short time. The optimal value of the learning rate is 0.01, so the learning rate is set to 0.01 in the following experiments.

4.3. Analysis of Experimental Results
4.3.1. Convergence of DTOS

The convergence of DTOS is firstly verified, and the experimental results are shown in Figure 6. Three networks with three different dependencies shown in Figure 3 are trained. The structure of networks A and B is simpler and they converge at rounds 2153 and 1769, respectively. The structure of network C is the most complex, which effects the speed of the algorithm. Network C converges at round 2808.

4.3.2. Cost Values of Each Algorithm with Different Dependencies

DTOS is compared with four other algorithms, which are the local computing-only algorithm (LOCAL), edge computing-only algorithm (EDGE), random selection algorithm (RADOM), and DDQN algorithm [16]. The local computing-only algorithm indicates that all tasks are computed by the IoT device. The edge computing-only algorithm indicates that all tasks are uploaded to the edge server for computation. The random selection algorithm indicates that all tasks are randomly selected to be computed by the edge server or the IoT device. The experiments are conducted 50 times and the results are averaged. The experimental results are shown in Figure 7. The cost of DTOS is the lowest for all various dependencies. Since the structure of network C is the most complex, its cost value is higher than the cost values of networks A and B. Since the algorithm LOCAL schedules all tasks to be computed locally, there is no transmission delay, but it causes excessive energy consumption. The algorithm EDGE offloads all tasks to the edge server for computation, which results in less energy consumption, but causes long transmission delay when the transmission speed is low. Both DTOS and DDQN train DNNs through experience replay mechanisms, and DDQN rewards DNNs based on the cost value of a single-offloading decision. In comparison, DTOS selects the optimal offload decision from multiple parallel offload decisions and uses priority experience replay mechanism and penalty mechanism according to their cost value; thus, DTOS has better performance. To further verify the effectiveness of DTOS, Figure 3(c) will be used for the subsequent experiments.

4.3.3. Cost Values of Each Algorithm with Different Task Data Sizes

The variations of the cost value according to the average data size of the computational task in different algorithms are shown in Figure 8. The cost values generated by each algorithm are positively correlated with the average data size during the growth of the average data size from 40 MB to 120 MB. The reason is that the larger data size generates greater latency and energy consumption, whether for local or edge computing. In comparison with other algorithms, DTOS has the slowest growth in cost value and the lowest cost value. It shows that DTOS outperforms all other four algorithms for different task data sizes.

4.3.4. Cost Values of Each Algorithm with Different Task Computational Complexity

Figure 9 shows the variation of cost values with different task computational complexities. The cost value is positively related to the task computation complexity for DTOS, local computing-only algorithm, and DDQN. The reason is that the increase in computational complexity of tasks leads to an increase in computation delay and energy consumption of IoT devices. However, for an edge computing-only algorithm or random selection algorithm, the cost value does not always increase when the computational complexity increases. Since the edge server’s computational resources are much larger than the local computation resources, the computation delay is not the main factor affecting the cost values of these two algorithms. Overall, the cost value of DTOS is lower than the other four algorithms, indicating that DTOS has the best performance for different task computational complexity.

4.3.5. Cost Values of Each Algorithm at Different Network Sizes

When the number of IoT devices in the network is 10, 20, or 30, the variation of the cost value for each algorithm is shown in Table 2. DTOS has the lowest cost value at all network sizes, indicating that DTOS can adapt to different network sizes and perform well. DTOS outperforms DDQN by 13.9%, 27.1%, and 11.9% when the number of IoT devices is 10, 20, and 30, respectively.

5. Conclusions

In this paper, we adopt deep reinforcement learning to research dependent-task adaptive-offloading issues in dynamic network environments in edge computing. A deep reinforcement learning-based dependent task-offloading strategy is proposed to transform dependent task offloading into an optimal policy problem under the Markov decision process, and the optimal task-offloading decision is obtained by priority experience replay mechanism and penalty mechanism. The experimental results show that DTOS outperforms the local computing-only algorithm, edge computing-only algorithm, random selection algorithm, and DDQN algorithm all time in different task data size, different task computation complexity, and different network size.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 62172255).