Deep Reinforcement Learning for Scheduling and Offloading in UAV-Assisted Mobile Edge Networks

Tian, XianZhong; Miao, PingTing; Zhang, LuoMing

doi:https://doi.org/10.1155/2023/3940318

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Related Work System Model Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2023 | Article ID 3940318 | https://doi.org/10.1155/2023/3940318

Deep Reinforcement Learning for Scheduling and Offloading in UAV-Assisted Mobile Edge Networks

XianZhong Tian,¹PingTing Miao,¹and LuoMing Zhang¹

Academic Editor: Vishal Sharma

Received10 Oct 2022

Revised01 Feb 2023

Accepted02 Feb 2023

Published20 Feb 2023

Abstract

The low deployment cost and high mobility of unmanned aerial vehicles (UAV) have given UAVs widespread attention in mobile edge computing (MEC). Aiming at the characteristics of dynamic changes in computing tasks generated by the urban transportation network, this paper proposes an UAV-assisted edge computing offloading model; i.e., when the number of tasks is within the capacity of edge servers (ES), most of the tasks are processed by ESs, but when the amount exceeds the capacity, some tasks will be offloaded to UAV and processed by on-board servers on UAVs. To minimize the task delay and the cost of computing tasks, this paper formulates a mathematical model according to the task amount generated by the urban transportation network, in which the task delay is computed by using queueing theory. A deep reinforcement learning algorithm with double deep neural networks (DNN) is used to optimize the model and obtains the resource allocation strategy of UAVs and on-board servers and the offloading strategy of edge computing. Simulation experiments verify the effectiveness of these strategies.

1. Introduction

With the development of autonomous driving technology and artificial intelligence technology, emerging automotive applications and accompanying services cause a rapid increase in the number of computing tasks generated by vehicles [1]. Limited vehicle computing capacity is an obstacle to the development of in-vehicle applications and services. On the other hand, due to the increasing number of vehicles in the urban and traffic congestion during morning and evening rush hours, information about the road condition and vehicles in congestion is interacted frequently among vehicles and the surrounding infrastructure, which generate a large number of tasks [2, 3]. At the rush hours, waiting drivers and passengers will use their mobile devices more frequently, which also yields extensive computing tasks in the congestion area. Offloading tasks to the cloud makes massive long-distance communication which causes more transmission delay and energy consumption and cannot satisfy the latency-sensitive internet of vehicles (IOV).

MEC deploys computing resources on ESs close to devices, such as access points (AP) and base stations (BS). Sinking computing capacity to the edge of the network that makes ESs can share the computing pressure of nearby users to improve the efficiency of the network and save energy consumption of devices [4, 5]. In traditional MEC, the placement of ESs is fixed and static. ESs cannot dynamically change the location, and the number varies with the change of the computing tasks. During morning and evening rush hours, the surge of the number of tasks will cause ESs overloaded in some congested areas which lead to more task delay including queuing time and computing time, some tasks even be lost, while some ESs in noncongested areas or noncongested periods are relatively idle. Deploying dense ESs is a solution to solve the delay problem but will steeply increase the cost of deployment and cause a great waste of the resources of idle ESs. How to dynamically deploy ESs according to the real-time changed tasks in IoV is an issue worthy of research.

Using UAVs is a practical choice for commercial applications due to their convenience of deployment, low acquisition and maintenance costs, high maneuverability, and hovering capabilities [6]. To reduce the task delay and the computing cost, this paper proposes to deploy UAVs with on-board servers around crowded road sections during morning and evening rush hours. On-board servers work as computing resources that make UAVs have computing capacity [6]. The UAVs can assist the original fixed ESs in computing and meet the latency-sensitive requirements of IoV. (i)This paper proposes task offloading strategy and on-board server scheduling strategy with UAV assistant. These strategies fit in the scenarios where the computing task amount of ESs changes in real-time IoV. By dynamically adjusting the number of on-board servers on the UAVs and the proportion of task offloading, IoV meet the latency-sensitive requirement and reduce the task computing cost(ii)This paper establishes an UAV-assisted edge computing offloading mathematical model. The model uses queuing theory to calculate the system delay and is formulated as a joint optimization function of system delay and cost. The optimization variables are the schedule of on-board servers on the UAVs and the proportion of task offloading(iii)This paper proposes an algorithm, which uses two different DNNs and experience replay for continuous learning of policies. The learned policies are stored in the corresponding replay memories. In our algorithm, the memories and training parameters are continuously updated and iterated to obtain optimized UAV deployment strategy and task offloading strategy

The remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 introduces the system model and notations. Section 4 establishes the joint optimal function and introduces the solution. Section 5 shows the simulation. Section 6 is the conclusion.

To assist MEC, many papers investigated in partial task offloading and communication resource allocations. Wu et al. [7] used mobile computing cloud, designed a cost-driven scheduling strategy between communication and computation, and proposed an algorithm based on the Lyapunov optimization. Qu et al. [8] designed a deep meta-reinforcement learning-based offloading (DMRO) framework that combined multiple parallel DNNs and deep -learning algorithms to make offloading decisions, and proposed an initial parameter training algorithm based on metalearning for the change of MEC environments. [9] established a multidevice and multiserver MEC system model for DS and CI- (DSCI-) type tasks and applied the EH technology to IoT devices to make full use of green energy. The paper formally defined the stochastic optimization problem and designed the green parallel online offloading algorithm (GPOOA) based on the Lyapunov optimization framework to offload tasks in a parallel manner. The excellent mobility of UAV has aroused great research interest in academia. Using UAVs to assist the MEC system offers many new opportunities for solving conventional problems like optimizing communication and computing. Some research results have been obtained [10–20].

For optimizing communication, Wu et al. [10] considered a multi-UAV-enabled wireless communication system and proposed an efficient iterative algorithm for solving it by applying the block coordinate descent and successive convex optimization techniques. Its object was to maximize the minimum throughput over all ground users in the downlink communication. Wan et al. [11] proposed a three-layer online data processing network based on the MEC technique. The bottom layer is generated by distributed sensors with local information. UAV-BSs were deployed as moving ESs to collect data, and center cloud received the data and conducted further evaluation. Through allocating bandwidth and path planning to expand the coverage of UAV, the object is to ensure the freshness of data. Mozaffari et al. [12] investigated UAVs which were used as aerial BSs to collect data from ground devices and proposed a novel framework for jointly optimizing the 3D placement and the mobility of the UAVs, device-UAV association, and uplink power control to enable reliable uplink communications for the devices with a minimum total transmit power. Zeng and Zhang [13] assumed that the UAVs fly horizontally with a fixed altitude and formulated a theoretical energy consumption model of the flying speed, direction, and acceleration of UAVs. Zeng and Zhang used a simple circular UAV trajectory and simultaneously considered both the communication throughput and the UAV’s energy consumption to minimize the communication energy with ground devices. Liu et al. [14] designed a fully distributed control solution to navigate UAVs as the mobile BSs to fly around a target area. For providing long-term communication coverage for the ground mobile users, Liu et al. proposed a decentralized deep reinforcement learning-based framework to maximize the geographical fairness of all considered point of interests and minimize the total energy consumptions, while keeping UAVs connected and not flying out of the area border.

As to assist MEC computing, Zhan et al. [15] considered an edge computing scenario with UAV assistant system for Internet of Things computation offloading and studied the joint design of computation offloading, resource allocation, and UAV trajectory for minimization of energy consumption and completion time of the UAV. Zhan et al. used the path discretization technique, decoupled the problem into two subproblems, and addressed the two subproblems with successive convex approximation-based algorithms iteratively. In [16], UAV played the role of an aerial cloudlet to collect and process the computation tasks offloaded by ground users, aimed at minimizing the UAV’s energy consumption by optimizing computation offloading, and the Dinkelbach algorithm and the successive convex approximation technique were adopted to solve it. Yu et al. [17] proposed an innovative UAV-enabled MEC system, aimed at minimizing the weighted sum of the service delay of all IoT devices and UAV energy consumption by jointly optimizing UAV position, communication and computing resource allocation, and task splitting decisions. In [18], a computation efficiency maximization problem was formulated in a multi-UAV-assisted MEC system, where both computation bits and energy consumption were considered. The paper proposed an iterative optimization algorithm with double-loop structure to find out the best user association, allocation of CPU cycle frequency, power, and spectrum resources, as well as trajectory scheduling of UAVs. Yang et al. [19] used multiple UAVs that act as edge nodes in order to provide computing offloading services for ground IoT nodes which have limited local computing capabilities and proposed the differential evolution-based multi-UAV deployment mechanism which used a deep reinforcement learning algorithm in balancing the efficiency of the task execution and the load of UAVs. In [20], a moving UAV equipped with computing resources was used to serve users. Users can offload part of the tasks to UAVs for computing. The paper is aimed at minimizing the sum of the maximum delay by jointly optimizing the UAV trajectory and the ratio of offloading tasks and then proposed a novel penalty dual decomposition-based algorithm.

All above papers took UAV as dependent ES, while they did not use UAV to cooperate with excited static ESs which are already deployed and not use the characteristics of user task in assisting computing, i.e., task type and size, required computing resources, and maximum tolerable completion time. Different from above papers, this paper investigates using UAVs to assist edge computing and offloading and is mainly aimed at the feature of real-time changes of tasks and taking UAVs as adjuncts to ESs. According to the traffic task amount, dynamically offer assistant computing for ESs to minimize the task delay and computing cost.

3. System Model

The system model is shown in Figure 1, there are ESs in the traffic system. The set of ESs is presented by . Every ES has the same task computing rate , and the cost of ES computing unit task is . The number of UAVs for assisting is . One UAV assists one ES; i.e., if ES need assistance, there is UAV to assist it. UAVs take on-board servers as computing resources. Placing on-board servers on UAVs enables UAVs to compute tasks. The total number of on-board servers is , and the task processing rate of each on-board server is . UAVs change the amount of on-board server on them based on the requirement of the assisted ESs. Every on-board server’s using cost and the unit task computing cost are and , respectively. UAVs will assist ESs with computing according to the task load, and the cost per trip of an UAV is .

Set as a time slot set. In slot , all the tasks will be offloaded to the nearest ES. The system decides whether to use UAVs to assist computing and how to allocate on-board servers among UAVs.

Assume that in slot , devices offload all their tasks to the nearest ES. The average task arrival rate of ES in this slot is , which follows the poisson distribution. In a different slot, different ESs have different . As for the same ES , traffic jam period’s arrival task rate may be higher than the traffic light period. And in the same slot , traffic jam area’s task arrival rate may be higher than the traffic light area.

Part of the tasks in ES can be offloaded to the corresponding UAV for computing, and the rest are computed locally by ES . The possibility of local ES computing is , and the possibility of UAV computing is . Thus, the task arrival rates of ES and the corresponding UAV are _,t and , respectively. The schedule of on-board servers can be presented by ; presents the number of on-board servers on UAV in time slot . If is 0, it means ES without UAV assistant. Use to constrain the number of on-board servers not exceeding its total number. In some time slots, if ES has no UAV assistant, will be set as 1.

3.1. System Delay Model

The process of tasks offloaded from ES to UAV can be regarded as a queuing model in which two service windows are connected in series [21], like Figure 2. Though the two windows are in series, both windows only have one service, i.e. ES or UAV. The task arrival rate follows the Poisson distribution. We assume that every average size of the arrival task is the same, and the service time follows exponential distribution, such that each window can be seen as an M/M/1 queue model. The first service window is ES transmission. The service rate of first window is the transmission rate of ES presented as , which means and . is the bandwidth, and is the gain of channel between ES and UAV. present the noise. The second service window is UAV computing. As for UAV, its service rate is the sum of computing rate of on-board servers, which can be calculated as .

Figure 3 shows the state transition probability between the two windows. The state of two service windows is set as integer pair (), present there are tasks in ES transmission queue and tasks in UAV computing queue. The equilibrium equations are shown in Table 1.

Both service windows are an M/M/1 queue model, and task the leaving rate of the M/M/1 queuing process has the same Poisson process as the arrival rate, so the possibility of tasks in ES and tasks in UAV is presented by

For the time reversibility of M/M/1, the number of tasks in ES is independent of the number of tasks leaving. The tasks leaving from ES constitute the task arrival of UAV, so the amount of tasks in ES and UAV is independent; that is, the processing time and waiting time of a task at both sides are independent random variables; the possibility of integer pair () is

Taking (2) into the equilibrium equations above can verify that (2) is the only solution, and is the limit possibility [21]. The total length of queue of offloading to UAV from ES is , calculated as

The task delay of offloading to UAV, which includes queueing delay and processing delay in two windows, is

The processing of tasks on local ES computing also satisfies the M/M/1 queue model. The local computing rate is the computing rate of ES , which means . According to the queueing theory, the task queue length of local ES in slot is . And task delay for local computing, which include queue time and computing at ES , is

The computing process in UAV and ES is parallel; thus, the delay on ES in slot is the bigger one, presented as

3.2. Problem Formulation

This paper intends to deploy UAVs (include on-board servers) properly and adopts suitable offloading strategy for optimizing the performance of the system. To satisfy the delay-sensitive requirement and save expenses, the smaller system delay and the cost of computing tasks and using UAVs should be considered. The set of task arrival rate is presented as . and present the schedule of on-board servers and offloading strategies of ESs, respectively. The indicate function is

The optimization object is to minimize the weighted average joint function of system delay and cost, and the cost includes task computing cost of ESs and UAVs and using cost of UAVs and on-board servers. where , , , , and are weighting parameters. The formulated optimization problem is the mathematical model (P1):

Constraints (13), (14), and (15) present that the task arrival rate should be smaller than the computing rate to ensure that the tasks can be completed and maintain the system stability. Equation (16) means the on-board servers on a UAV do not exceed its maximum capacity .

Based on the proposed model, it can be proved that (P1) is a NP-hard problem by Theorem 1.

Theorem 1. The proposed (P1) is NP-hard.

Proof. Through reducing a classic NP-hard problem, named Capacitated Facility Location Problem (CFLP), that is defined as follows: (P1) to prove the NP hardness. Given the facility set and , the facility capacity is , and the cost of opening facility is . Given the user set , the users have different demands, and the cost of using different facilities is also different. CFLP is to find a solution of opening facility and serve every user such that the total cost is minimized, subject to the facility capacity, and users can only select one facility to ensure that the demand be fulfilled.
It shows that an instance of CFLP can be reduced to an instance of minimizing (P1). For is independent of , in slot , when given , and its corresponding can be seen as a whole and corresponds to the facility set . The task computing cost of ESs corresponds to the cost of opening facility which is a constant value. The facility capacity constraint is covert to (13) and (14). Users correspond to UAVs and on-board servers; the cost of using facility is the cost of using and computing on UAVs. One UAV and one on-board server can only assistant one ES. A solution to this case is a solution to CFLP. Similarly, in slot , when given , and its corresponding can be seen as a whole and corresponds to the facility set and only serve one user. The using cost of UAVs and on-board servers corresponds to the cost of opening facility which is a constant value. The capacity constraint is convert to (13) and (14). Users correspond to ; the cost of using the facility is the computing cost of ESs and UAVs. A solution to the case of given is a solution to CFLP. (P1) is the combination of the above two CFLP; a solution to (P1) is a solution to CFLP. Thus, this theorem holds.

4. Problem Solution

Due to (P1)’s huge space and high complexity of search, this paper proposes a deep reinforcement learning-based UAV assignment (DRUA) algorithm based on deep reinforcement learning to optimize the strategies of UAV scheduling and task offloading. Compared with supervised DNN learning methods which are widely used in dynamic wireless network scenarios, deep reinforcement learning did not need manually labeled training samples and has better robustness. DRUA after deep reinforcement learning can immediately generate the optimal UAV scheduling and task offloading strategies once the task arrival rate is updated.

4.1. Algorithm Overview

The framework of the algorithm is presented in Figure 4. There are two DNNs. In the first DNN, input every ES’s task arrival rate in time slot which is . This DNN generates relaxed UAV scheduling action (i.e., the schedule strategy) based on current policy (decided by network parameter ). Then, the previous DNN’s output is put into the second DNN as input and generates offloading action (i.e., the offloading strategy) (every possibility is relaxed continuously between 0 and 1) based on current policy (decided by network parameter ). Through the following Algorithm 1, the elements in traverse the elements in and , until the best match between and () and (), respectively, is found by calculating the smallest value , and get the optimal UAV scheduling strategy and offloading strategy . Both DNNs take value as a reward and store the current optimal state-action pair and into the corresponding replay memory.

In slot , for updating policy, the two DNNs will extract a batch of samples from their replay memory for training. DRUA update training parameters and into and . Then, the corresponding updated policy and will generate actions and with new . Once the task arrival rate changed, the above iteration will repeat and improve the two policies. Next, introduce the two stages in detail.

4.2. Action Generation

In action generation, after initializing, the network parameters and follow a zero-mean normal distribution at the time slot beginning, and the first DNN outputs the relaxed UAV scheduling action which can be presented by mapping function as , where and needs to satisfy the constraint (11).

Then, DRUA puts relaxed into the second DNN as input to generate the relaxed offloading action which can be presented by mapping function as , where and needs to satisfy the constraint (12)–(15).

The theorems state that as long as a hidden layer is equipped with enough neurons and the neurons have the suitable active function, the network has the capacity to infinitely approximate arbitrary continuous functions [22]. Common active functions are ReLu, sigmoid, tanx, softmax, etc. In DRUA, the hidden layer of two DNNs uses the ReLu function as active function. What is different between them is that the first DNN uses softmax function in the output layer and the output sum is 1, which can be regarded as the proportion of total on-board servers that satisfy the constraint (11). The second DNN uses sigmoid function in the output layer and the output relaxed in 0 and 1. But when , should be 1. After getting and , the ’s corresponding and may not be the best match of optimal . Using the collocation among and the relaxed variables to find the optimal actions, i.e., using the method of traversing to find the matches of the minimum value, shown in Algorithm 1.

Input: Parameters , and
Output: ,
1: Elements in the vectors and are rearranged in ascending order to get and
2: Initialize as a very large number and and as empty vectors;
3: fordo
4: fordo
5: fordo
6: Compute Q() from (8);
7: ifQ() < then
8: ;
9: [k] = ;
10: ;
11: end if
12: end for
13: end for
14: end for
15: return,.

4.3. Policy Update

The optimal UAV scheduling strategy and offloading strategy obtained from Algorithm 1 will be used in training two DNNs. First, DRUA initializes the limited replay memories and stores every slot’s optimal state-action pair into the memories as training samples. When the memory is full, the current optimal pairs will replace the oldest. DRUA stores into the first DNN replay memory and stores into the second DNN replay memory.

DRUA uses the memory replay method [23, 24] to train two DNNs by using stored samples. Compared with using all samples, memory replay can reduce the complexity of training and reduce the variance of training parameters when updating. Randomly extracting samples after certain time intervals also reduces the relationship among similar samples and speeds up the convergence. Extracted batch of samples from two memories are and . is a group of time indexes. Training parameters and will reduce the loss of average crossentropy by the Adam algorithm, presented by where is the size of and the superscript is the transpose operator, log indicates the logarithmic operations of elements in a vector. This paper omits the process of Adam [25]. Once the memory collects enough new samples, DNNs will begin training in every slot , and is the training interval.

Both two DNNs learn from historical optimal state-action pairs and and generate better actions. The infinite but update timely memory offers the current optimal policy for DNN learning. A closed-loop reinforcement learning mechanism can optimize action generation. Algorithm 2 shows how these two DNNs generate actions and update policies.

Input:
Output: ,
1: Using random parameter and to initialize the two DNNs, respectively;
2: Set the number of iterations M and training interval δ;
3: fort =1,2,...,Mdo
4: Generate relaxed action and ;
5: Get and from Algorithm1;
6: Update memory by adding and ;
7: ift mod =0 then
8: Extract a batch of samples and from memory;
9: Using these samples training two DNNs, respectively, and update and by Adam;
10: end if
15: end for

5. Simulations

In this section, simulation experiments are shown to evaluate the performance of the DRUA algorithm and take two compared algorithms as a comparison.

5.1. Simulation Setup

There are 10 ESs in the scenario. The total on-board server number is 20. When computing the object function , compared with the cost, the task delay is very small; thus, the weighting factor of task delay is greater than the weighting factor of cost, where , , , , and are 0.8, 0.05, 0.05, 0.05, and 0.05, respectively. The detailed parameter sets in the scenario are in Table 2. In Table 2, the parameters refer to other papers, such as the computing rate of ES which refers to [26]. The computing rate of on-board sever considered should be smaller than the computing rate of ES. The transmission power of ES, bandwidth, and the noise between UAV and ES are referred to [15, 16]. The cost of computing by ES or UAV is referred to [27].

In the proposed DRUA algorithm, both two DNNs have one input layer, two hidden layers, and one output layer. The first and second hidden layers have 120 and 80 hidden neurons, respectively. These two DNNs can be replaced by other network structures with the different numbers of hidden layers and neurons, like CNN or RNN, for fitting different problems. The DRUA algorithm runs in PyTorch 3.7, and both two DNNs’ training batch sizes are 128. The learning rate for Adam optimizer in the first DNN is 0.01, and the second DNN is 0.1. And both DNNs replay memory size, and training intervals are 1024 and 10, respectively

For evaluating the performance of DRUA, there are two comparison mechanisms. One is all server computing (ASC); all the tasks arrived at ESs computed by ESs. Another is average UAV computing (AUC), every ES has an UAV, and every UAV has the same number of on-board servers where the offloading possibility is all 0.5.

5.2. Simulation Result

For verifying the effect of the proposed algorithm, this paper compares the requirement of urban transportation system on delay and computing cost. The arrival rate of every ES is random in [150, 270). To compare the performance of three algorithms on task delay and value , Figure 5 shows the three algorithms’ performance on task delay. The blue line indicates the ASC, and the red and yellow lines are AUC and DRUA. The horizontal and vertical coordinates are 1000 time slots after training and the largest delay , respectively. It can be observed from this figure that the proposed DRUA has the lowest delay. The reason is it can offload part of the tasks to UAVs for accelerating the computing efficiency when the task load is relatively huge. AUC also offloads part of the tasks to reduce the task delay. Figure 6 shows the algorithms’ performance on value in the same slots. Three lines are all stable. Blue line ASC not only is the highest in delay but also causes expensive computing cost making the highest value. The lowest DRUA yellow line indicates that according to the actual task arrival rate, making decisions will get better performance both on delay and computing cost. DRUA compared with AUC has more appropriate and efficient task offloading and computing resource allocation strategies.

In order to verify the rightness of the set on-board server numbers, this paper compared the performance on the different numbers of on-board servers. The task arrival rate is set at random [150, 270). Using value as a metric, on-board servers increase from 0 to 50. As shown in Figure 7, for ASC without UAV assistant, the value is changeless shown in the blue line. The number of on-board servers infects the AUC and DRUA, shown in red and yellow lines. When on-board servers is 0, AUC cannot be used, and the value is nonexistent. With the increase of on-board servers, AUC and DRUA both reduce their value . However, when the number exceeds 20, the increased cost for on-board servers adds the value of AUC, even higher than ASC, but DRUA becomes stable. DRUA still chooses about 20 on-board servers for allocation even if there are more on-board servers that can be used. 20 on-board servers are good for reducing cost and latency. Setting the amount of on-board servers at 20 is suitable.

For testing the capacity of these algorithms on task arrival rate, this paper uses value as metrics. The task arrival rate changed from 100 to 500. The result is shown in Figure 8. The red line indicates the AUC, and blue and yellow lines indicate the ASC and DRUA, respectively. When the task arrival rate is around [100, 200], DRUA chooses all local computing, the value is the same as ASC. The value of ASC steeply increases when the task arrival rate that exceeds 270 for constraint (15) is not satisfied, where the stability of the system is broken. Similar to AUC, when the arrival rate is not satisfied with constraint (13), computing tasks are also unsuccessful. But DRUA still works effectively by optimizing resource allocation and offloading until constraints (15) or (13) are not satisfied.

Finally, consider the effect of network parameters of DRUA on convergence. The task arrival rate is random in [150, 270). The two evaluations compare the effect of training interval and replay memory size which is measured by the average of and . In Figure 9, besides the memory size, other network parameters are set as above and a table. It can be observed that there is faster convergence with the bigger memory size like the green line of memory size 2048 and the purple line of memory size 1024. A bigger memory size means storing more samples, and the correlation of randomly extracted samples will be smaller which is conducive to the convergence of the network. However, if the size is too small, there will be large fluctuations before convergence. Choosing 128 as batch size also has a similar convergence rate and small fluctuation, i.e., the blue line. In Figure 10’s evaluation, other network parameters are set as before, and it shows that the smaller training interval can fasten the convergence speed of DRUA, and the policies are updated more frequently like the blue line of interval 5 and the red line of interval 10. But too small training interval indicates the increased frequency of training and causes excessive training. Setting training interval in 10 is suitable for fastening the convergence of DRUA.

6. Conclusion

This paper proposes a model of using UAVs to assist ESs in MEC for decreasing the task computing cost and the task delay in IoV and uses the queueing theory to calculate delay. The delay of computing at UAVs is calculated from the average queue length of tasks obtained by combining two M/M/1 models in series. Then, using the obtained delay, combined with the computing cost, a mathematical model that minimizes the joint delay and cost is established. The deployment of the on-board servers and the offloading probability are the optimization variables. The proposed DRUA based on deep reinforcement learning consists of two DNNs to minimize the weight function of task computing delay and computing cost. DRUA learns from historical experience and optimizes the policies. The evaluation results show that DRUA is effective in solving the proposed problem and the set of network parameters can speed the convergence up.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 61672465) and Natural Science Foundation of Zhejiang Province (Grant No. LZ22F020004).

References

J. E. Siegel, D. C. Erb, and S. E. Sarma, “A survey of the connected vehicle landscape-architectures enabling technologies applications and development areas,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 8, pp. 2391–2406, 2018.
View at: Publisher Site | Google Scholar
W. Yue, L. Wang, Z. Liu, Y. Xi, and X. Guan, “Sampled-data internet of connected vehicles control with channel fading and time-varying delay,” Mechanical Systems and Signal Processing, vol. 135, p. 106430, 2020.
View at: Publisher Site | Google Scholar
Z. Wang, Y. Guo, Y. Gao, C. Fang, M. Li, and Y. Sun, “Fog-based distributed networked control for connected autonomous vehicles,” Wireless Communications and Mobile Computing, vol. 2020, Article ID 8855655, 11 pages, 2020.
View at: Publisher Site | Google Scholar
S. Sarkar, S. Chatterjee, and S. Misra, “Assessment of the suitability of fog computing in the context of Internet of Things,” IEEE Transactions on Cloud Computing, vol. 6, no. 1, pp. 46–59, 2018.
View at: Publisher Site | Google Scholar
Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: the communication perspective,” IEEE communications surveys & tutorials, vol. 19, no. 4, pp. 2322–2358, 2017.
View at: Publisher Site | Google Scholar
W. Feng, J. Wang, Y. Chen, X. Wang, N. Ge, and J. Lu, “UAV-aided MIMO communications for 5G Internet of Things,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1731–1740, 2019.
View at: Publisher Site | Google Scholar
H. Wu, K. Wolter, P. Jiao, Y. Deng, Y. Zhao, and M. Xu, “EEDTO: an energy-efficient dynamic task offloading algorithm for blockchain-enabled IoT-edge-cloud orchestrated computing,” IEEE Internet of Things Journal, vol. 8, no. 4, pp. 2163–2176, 2021.
View at: Publisher Site | Google Scholar
G. Qu, H. Wu, R. Li, and P. Jiao, “DMRO: a deep meta reinforcement learning-based task offloading framework for edge-cloud computing,” IEEE Transactions on Network and Service Management, vol. 18, no. 3, pp. 3448–3459, 2021.
View at: Publisher Site | Google Scholar
J. Chen, H. Wu, R. Li, and P. Jiao, “Green parallel online offloading for DSCI-type tasks in IoT-edge systems,” IEEE Transactions on Industrial Informatics, vol. 18, no. 11, pp. 7955–7966, 2022.
View at: Publisher Site | Google Scholar
Q. Wu, Y. Zeng, and R. Zhang, “Joint trajectory and communication design for multi-UAV enabled wireless networks,” IEEE Transactions on Wireless Communications, vol. 17, no. 3, pp. 2109–2121, 2018.
View at: Publisher Site | Google Scholar
S. Wan, J. Lu, P. Fan, and K. B. Letaief, “Toward big data processing in IoT: path planning and resource management of UAV base stations in mobile-edge computing system,” IEEE Internet of Things Journal, vol. 7, no. 7, pp. 5995–6009, 2020.
View at: Publisher Site | Google Scholar
M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Unmanned aerial vehicle with underlaid device-to-device communications: performance and tradeoffs,” IEEE Transactions on Wireless Communications, vol. 15, no. 6, pp. 3949–3963, 2016.
View at: Publisher Site | Google Scholar
Y. Zeng and R. Zhang, “Energy-efficient UAV communication with trajectory optimization,” IEEE Transactions on Wireless Communications, vol. 16, no. 6, pp. 3747–3760, 2017.
View at: Publisher Site | Google Scholar
C. H. Liu, X. Ma, X. Gao, and J. Tang, “Distributed energy-efficient multi-UAV navigation for long-term communication coverage by deep reinforcement learning,” IEEE Transactions on Mobile Computing, vol. 19, no. 6, pp. 1274–1285, 2020.
View at: Publisher Site | Google Scholar
C. Zhan, H. Hu, X. Sui, Z. Liu, and D. Niyato, “Completion time and energy optimization in the UAV-enabled mobile-edge computing system,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7808–7822, 2020.
View at: Publisher Site | Google Scholar
M. Li, N. Cheng, J. Gao, Y. Wang, L. Zhao, and X. Shen, “Energy-efficient UAV-assisted mobile edge computing: resource allocation and trajectory optimization,” IEEE Transactions on Vehicular Technology, vol. 69, no. 3, pp. 3424–3438, 2020.
View at: Publisher Site | Google Scholar
Z. Yu, Y. Gong, S. Gong, and Y. Guo, “Joint task offloading and resource allocation in UAV-enabled mobile edge computing,” IEEE Internet of Things Journal, vol. 7, no. 4, pp. 3147–3159, 2020.
View at: Publisher Site | Google Scholar
J. Zhang, L. Zhou, F. Zhou et al., “Computation-efficient offloading and trajectory scheduling for multi-UAV assisted mobile edge computing,” IEEE Transactions on Vehicular Technology, vol. 69, no. 2, pp. 2114–2125, 2020.
View at: Publisher Site | Google Scholar
L. Yang, H. Yao, J. Wang, C. Jiang, and Y. Liu, “Multi-UAV-enabled load-balance mobile-edge computing for IoT networks,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 6898–6908, 2020.
View at: Publisher Site | Google Scholar
Q. Hu, Y. Cai, G. Yu, Z. Qin, M. Zhao, and G. Y. Li, “Joint offloading and trajectory design for UAV-enabled mobile edge computing systems,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1879–1892, 2019.
View at: Publisher Site | Google Scholar
S. M. Ross, Introduction to Probability Models, Academic Press, Cambridge, MA, USA, 2014.
K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
View at: Publisher Site | Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
View at: Publisher Site | Google Scholar
L.-J. Lin, “Reinforcement learning for robots using neural networks,” in School ff Computer Science, Carnegie Mellon University ProQuest Dissertations Publishing, Pittsburgh, PA, USA, 1993, Tech. Rep. CMU-CS-93-103.
View at: Google Scholar
D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in 3rd International Conference for Learning Representations, San Diego, CA, USA, 2015.
View at: Google Scholar
X. Tian, L. Zhou, and T. Xu, “Global energy optimization strategy based on delay constraints in edge computing environment,” in Proceedings of the 24th International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pp. 33–40, Alicante, Spain, 2021.
View at: Google Scholar
P. Dai, K. Hu, X. Wu, H. Xing, and Z. Yu, “Asynchronous deep reinforcement learning for data-driven task offloading in MEC-empowered vehicular networks,” in IEEE INFOCOM 2021- IEEE Conference on Computer Communications, pp. 1–10, Vancouver, BC, Canada, 2021.
View at: Google Scholar

Copyright

Copyright © 2023 XianZhong Tian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Wireless Communications and Mobile Computing

Deep Reinforcement Learning for Scheduling and Offloading in UAV-Assisted Mobile Edge Networks

Abstract

1. Introduction

2. Related Work

3. System Model

3.1. System Delay Model

3.2. Problem Formulation

4. Problem Solution

4.1. Algorithm Overview

4.2. Action Generation

4.3. Policy Update

5. Simulations

5.1. Simulation Setup

5.2. Simulation Result

6. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright