Abstract
Benefiting from the progress of microelectromechanical system (MEMS) technology, wireless sensor networks (WSNs) can run a large number of complex applications. One of the most critical challenges for complex WSN applications is the huge computing demands and limited battery energy without any replenishment. The recent development of UAV-assisted cooperative computing technology provides a promising solution to overcome these shortcomings. This paper addresses a three-tier WSN model for UAV-assisted cooperative computing, which includes several sensor nodes, a moving UAV equipped with computing resources, and a sink node (SN). Computation tasks arrive randomly at each sensor node, and the UAV moves around above the sensor nodes and provides computing services. The sensor nodes can process the computation tasks locally or cooperate with the UAV or SN for computing. In a life cycle of the UAV, we aim to maximize the energy efficiency of cooperative computing by optimizing the UAV path planning on the constraints of node energy consumption and task deadline. To adapt to the time-varying indeterminate environment, a deep Q network- (DQN-) based path planning algorithm is proposed. Simulation studies show that the performance of the proposed algorithm is better than the competitive algorithms, significantly improves the energy efficiency of cooperative computing, and achieves energy consumption balance.
1. Introduction
Recently, as the technical advances in microelectromechanical system (MEMS), sensors have gained greater computing capacity [1]. A large number of complex applications, like image identification and intelligent control, can be applied in wireless sensor networks (WSNs) [2]. However, most of these applications are full of computation-intensive tasks [3]. It is a challenge for a single sensor node to process these computation tasks independently due to its limited computing capacity and battery energy. Fortunately, the emerging paradigms of edge computing [4–6] and cooperative computing [7–9] have opened up new ways to address this challenge. By utilizing edge computing technology, sensors can offload some complex computation-intensive tasks onto edge servers with powerful computing capacity [10]. Different from edge computing in which computation tasks are offloaded to edge servers, cooperative computing pays more attention to mining the computing resources of edge devices to complete computation tasks. Through cooperative computing between sensor nodes, some lightweight computation tasks can be handled locally [11]. However, there are new challenges for offloading computation tasks to servers or peer nodes. On one hand, since edge servers are usually fixed and deployed far away from WSNs, the nodes will consume a lot of energy if offloading tasks directly to edge servers. On the other hand, limited by computing capacity and battery energy, cooperative computing between nodes is difficult to meet the computation-intensive tasks.
With the continuous breakthrough and improvement of unmanned aerial vehicle (UAV) technology, UAV assistance technology has been studied in several scenarios, such as data acquisition [12–14], wireless charging [15], relay transmissions [16], channel estimation [17], and node localization [18]. For UAV-assisted computing, it can alleviate the challenges posed by insufficient computing capacity [19]. Compared with traditional architecture with fixed servers, UAV equipped with edge server can provide computation offloading service for devices more efficiently by virtue of its advantages such as fast deployment, strong scalability, and flexibility [20]. In [21], an end-edge structure of UAV and ground terminals for cooperative computing was proposed. In [22], a UAV was deployed to provide edge computing services for terminal devices to minimize latency and energy consumption by optimizing the task split ratio, UAV deployment, and resource allocation. Furthermore, path planning is an important area of UAV-assisted cooperative computing research. In [23], the UAV trajectory and bit allocation were optimized under the constraints of delay and UAV energy. Qian et al. ([24]) proposed a path planning algorithm based on convex optimization, maximizing the sum of bits offloaded from all terminals to the UAV subjected to the energy constraint of the UAV. In [25], a penalty dual decomposition-based algorithm was proposed to minimize the sum of the maximum delay among all the users in each time slot by jointly optimizing the UAV trajectory, the ratio of offloading tasks, and the user scheduling variables. The above researches mainly focus on the constraint of UAV energy or task delay. However, in WSNs, sensor nodes are equipped with energy-constrained batteries and are difficult to replenish, and the death of nodes may result in coverage holes of WSNs [26]. Therefore, node energy consumption and energy consumption balance should be considered a priority when using UAV-assisted cooperative computing in WSNs.
By reviewing the relevant literature, only a few studies focused on UAV-assisted cooperative computing in WSNs. In [27], an air-ground integrated cooperative computing framework for vehicle vision sensor networks is proposed, in which computation tasks are implemented locally and also partially offloaded to UAVs. The UAV trajectory and offloading rate were optimized by an iterative algorithm to minimize the total delay of the network. In [28], the authors designed a three-tier cooperative computing architecture in which computation tasks can be handled in parallel at the local, UAV, and access point. The UAV trajectory, task allocation ratio, and power allocation were optimized by classical convex optimization techniques to minimize the weighted sum of the total energy consumption of devices and UAV. However, in the above literature, the computation tasks were continuously generated, and the two important parameters characterizing the tasks, the number of bits and deadline, remain constant. In practical applications, the generation of computation tasks in WSN is usually event driven, which means that each node receives tasks randomly [29]. Furthermore, the received tasks have a different number of bits and deadlines according to the emergency degree of the events in practice. In this case, WSN can be described as a time-varying indeterminate system. It is difficult to solve optimization problems in time-varying systems by using traditional convex optimization algorithms. This is because the time-varying parameters make the optimization problems nonconvex and nonlinear, which are generally difficult to solve. Even though the problem can be solved by utilizing convex optimization techniques like convex relaxation and continuous convex approximation, the solving algorithm is difficult to be applied in engineering practice due to the rapid increase in computational complexity [30].
In response to the above challenges, in this paper, we introduce UAV cooperative computing mechanism into classical WSN architecture. Classical WSN architecture is typically hierarchical, including sensor nodes and sink node (SN) [31]. The sensor nodes are mainly responsible for data collection, which has limited energy and weak computing capacity. The SN, which is responsible for data fusion, is usually connected to the base station and has a strong computing capacity and sufficient energy. The tasks that cannot be handled by the sensor nodes will be directly sent to the SN for processing. Since the distance between the sensor node and SN is usually far away and there may be some blocks, it will consume a lot of energy to offloading computation tasks directly to the SN. Different from the traditional architecture that only relies on the SN for cooperative computing, the UAV-assisted cooperative computing introduced in this paper can provide close-range computing services to the sensor node, which can help sensor node save energy and prolong the lifetime of WSN. In addition, in order to get closer to the practical applications, time-varying environments are simulated in our model. First, in order to simulate the event-driven-based task generation mechanism, the sensor node receives tasks with a certain probability rather than continuously. Second, the number of bits and deadline of the tasks also vary randomly, which means that the tasks are not the same as the amount and urgency. Third, the UAV can take off from any location within the monitored area, which means that the initial state of network is different. The object of this paper is to maximize the energy efficiency of cooperative computing by optimizing the UAV path planning on the constraints of node energy consumption and task deadline. It is very challenging to solve the problem with classical optimization algorithms because the environmental parameters are constantly changing.
In order to address the dynamic optimization problems, in this paper, we propose a path planning algorithm based on a deep Q network (DQN). The DQN algorithm, which was proposed by Google Deepmind [32], is a kind of reinforcement learning algorithm based on a deep neural network. It has been proved that capable of working at a complex dynamic optimization problem [33, 34]. In DQN, a Q-function based on a deep neural network replaces the Q-table in classical Q learning, which makes it capable of dealing with high dimensional continuous states. Some interesting techniques, such as experience replay and fixed Q-targets, improve the training stability of the DQN algorithm [35, 36]. Another advantage of the DQN algorithm is that compared with traditional optimization algorithms, the trained DQN model can be deployed offline and has a relatively fast speed of inference, especially for high-dimensional variables [30]. Although DQN has had notable success in the game-playing scene, it is still an open area of research in UAV-assisted cooperative computing in WSNs.
The contribution of this paper mainly includes the following aspects: (i)A three-tier WSN model for UAV-assisted cooperative computing is constructed, in which computation tasks can be processed by nodes, UAV and SN. In order to cope with randomly generated tasks, the UAV needs to dynamically adjust the trajectory to provide computing services for the nodes(ii)A DQN-based path planning algorithm is proposed, which can help the UAV choose a better flight path according to the time-varying environment to improve the energy efficiency and achieve an energy consumption balance(iii)Simulation studies show that by optimizing the UAV trajectory, the sensor nodes offload more computation tasks that cannot be handled by themselves to the UAV rather than SN. The performance of the proposed algorithm is better than the competitive algorithms, significantly improves the energy efficiency of WSN, and achieves energy consumption balance
The rest of this paper is organized as follows: Section 2 introduces the network model and problem formulation. Section 3 introduces the DQN-based UAV path planning algorithm. Section 4 describes the simulation method and results. Section 5 gives the conclusion of our work.
2. Network Model and Problem Formulation
2.1. The UAV-Assisted WSN Model
As shown in Figure 1, we consider sensor nodes, indexed by the set , are distributed randomly in the monitored area and the horizontal position coordinates are denoted by . All nodes have the same initial energy. After deployment, the energy of all nodes cannot be replenished and their positions cannot be moved. A sink node (SN) exists in the area center. SN is connected to the computing center through a wired network, so it has a sufficient energy supply and strong computing capacity. All sensor nodes can communicate with SN and offload computation tasks. A UAV with limited energy moves around above the sensor nodes and provides computing services for sensor nodes, which fly at constant velocity and fixed altitude . The three-dimensional coordinates of the UAV can be denoted as . The operation period of the UAV is discrete into several nonuniform time slots, indexed by . Within each time slot , the action of the UAV is to fly to any node and hover over the node to perform cooperative computing. The action set of the UAV represents the flight strategy of the UAV.

In each unit of time, the sensor node randomly generates computation task with probability , where denotes the number of bits of the task, is the task deadline for local computing. Each task is first stored in the node’s cache after arriving, and we consider a task model for partial offloading [5] where each task can be divided by any number of bits. Computation tasks can be partially or completely performed locally by using the CPU of nodes. Due to the limited computing capacity of the nodes, computation-intensive tasks cannot be completed locally, so these tasks need to be processed by cooperative computing with UAV or SN (Different from the wireless mobile network, there are usually a few tasks of instant communication with extremely strict delay constraints [37]. Therefore, the nodes can store the computation-sensitive tasks which cannot be handled by themselves in the cache and wait for the UAV to come. For the delay-sensitive task, the nodes can offload them to the SN.). If the UAV can reach the node within the of tasks to provide computing services nearby, the nodes offload the computation tasks in the current cache to the UAV, otherwise, the nodes can only offload the tasks which meet the to the SN. Since the transmission distance between the SN and the sensor nodes is relatively long, the nodes will consume more energy consumption for directly offloading the tasks to SN than offloading tasks to UAV. In an operation period , the computation tasks generated by node are represented by the set , the task set of local computing is denoted by , the task set of offloading to UAV is denoted by , and the task set of offloading to SN is denoted by , expressed by the following formula: where is the deadline for the th task. denotes a timer of the th task, which records the duration of a task from its generation. The timer is cleared when the task is completed locally or transmitted to UAV or SN.
2.2. Sensor Node Model
2.2.1. Sensor Node Delay Model
(1) Node Transmission Delay. Considering that nodes adopt orthogonal frequency division multiplexing channels, there exists no interference between nodes [38]. We assume that the node communicates with the UAV and SN in the same channel state. Therefore, the uplink transmission rate of node to UAV can be expressed as and the uplink transmission rate of node to SN can be expressed as
where is the communication channel bandwidth, is the transmission power of the nodes, and are the channel gains at the reference distance of 1 m, and is the noise power spectral density. is the flight altitude of the UAV, and is the horizontal distance between the node n and SN. Therefore, the transmission delay of -bit to UAV can be expressed as The transmission delay of -bit to SN can be expressed as
(2) Node Computing Delay. Let as the CPU frequency of node . We assume all nodes have the same CPU. represents the number of CPU cycles required to compute 1-bit data. Node computing delay is given.
2.2.2. Sensor Node Energy Model
(1) Node Computing Energy Model. The CPU architecture of the node adopts advanced dynamic voltage and frequency scaling (DVFS) technology. Thus, the energy consumption of -bit computation tasks can be given [39]. where is a constant determined by CPU architecture, is the CPU frequency, and is the execution time of completing the -bit task, .
(2) Node Transmission Energy Model. We refer to the classical transmission energy consumption model in WSNs [40]. The energy consumption of transmitting l-bit data can be expressed. where denotes the transmission distance, is the circuit energy consumption factor, which denotes the energy consumption of encoding and modulating 1-bit data. is the amplifier energy consumption factor, related to transmission loss.
2.3. UAV Model
We mainly consider the UAV’s flying model, computing model, and hovering model. The delay and energy consumption of receiving data and transmitting results are ignored [35].
2.3.1. Flying Model
The UAV flies at constant velocity and fixed altitude , let as the UAV flying power. In the th time slot, the flying distance of a UAV can be expressed as The flying delay of the UAV can be expressed as The flying energy consumption of the UAV can be expressed as
2.3.2. Computing Model
Similar to node computing delay (6) and computing energy model (7), the computing delay of -bit tasks for UAV can be given. The energy consumption of -bit computation tasks for UAV can be given. where is the CPU frequency of a UAV.
2.3.3. Hovering Model
If the UAV hovers on node , node offloads all computation tasks in the current cache to the UAV, and UAV returns the results after completing the cooperative computing. Therefore, the UAV-hovering delay is related to node transmission delay (4) and UAV-computing delay (12). Let as the number of bits of computation tasks in the current cache, the hovering delay of UAV is given.
Let as the hovering power, the UAV hovering energy is given.
Therefore, the total energy consumption of the UAV for one operation is given.
2.4. Problem Formulation
According to the previous analysis, when the node cannot complete computation tasks locally in time, it can choose UAV or SN to cooperate. Moreover, there is a fact that cannot be ignored in WSNs: transmission energy consumption is much higher than computing energy consumption, and transmission energy increases exponentially with distance [40]. The energy consumption of the nodes for transmitting data to the UAV or SN is much higher than the energy consumption of the nodes for computing locally. Therefore, cooperative computing energy consumption occupies the majority proportion of node energy consumption. The problem discussed in this paper is to find a policy that maximize the energy efficiency of cooperative computing within the life cycle of UAV under the constraints of node energy and task deadline. Within , the sum of the bits of cooperative computing for nodes is defined as and the total energy consumption of cooperative computing for nodes within can be expressed as
Then, we define the energy efficiency of cooperative computing for nodes within as
Through the above analysis, the optimization problem can be formulated as follows: where is the action policy of UAV, and represent the initial energy of the UAV and node, respectively. represents energy constraints of UAV. indicates that computation tasks can be performed locally or offloaded to the UAV or SN. represents energy constraints of nodes. The problem is a nonconvex, nonlinear, and mixed discrete optimization problem, which is difficult to handle with classical optimization algorithms [41]. In this paper, this problem is formulated as an MDP problem, and a DQN-based path planning algorithm is proposed to obtain the suboptimal solution for this problem.
We can analyze the problem from two aspects. First, in each unit of time, the sensor node randomly generates computation task with probability , and are indeterminate, which means that both the amount and deadline of the tasks to be processed by the node are time varying. Second, due to the limited computing capacity of the node, the task usually cannot be completed locally within the deadline. In this case, it is more energy saving to offload the unfinished tasks to the nearby UAV than to directly transmit them to SN. Therefore, in order to improve the energy efficiency of cooperative computing, more unfinished tasks need to be offloaded to the UAV, which requires the UAV to adjust its action policy in real time according to the computing capacity of the node, the number of bits of tasks, the deadline of the tasks, and the flight distance.
3. DQN-Based UAV Path Planning
According to the previous analysis, the UAV faces a time-varying indeterminate environment. Different action decisions need to be made according to different node states. In this section, the optimization problem of UAV path planning is abstracted as MDP, and then the model framework and training process of the proposed DQN algorithm are introduced in detail. In the proposed DQN algorithm, UAV continually tries different actions to get feedback from the environments to improve action decisions until the actions lead to better results, thereby maximizing long-term rewards.
3.1. Markov’s Decision Process
The UAV can be considered as an agent, and its action decision problem can be modeled as an MDP problem. The MDP problem consists of three parts: state space , action space , and reward function . The UAV performs an action at time slot , including flying and hovering, that is considered as a step, and each step consists of observing the state, performing an action, state transition, and getting rewards.
3.1.1. State Space
We define the state space as follows: where describes states of the UAV, and describes the states of the nodes.
For , it has two components : denotes the remaining energy of the UAV and denotes the position coordinates of the UAV.
For , it denotes the states of all nodes, specifically . For each , it has three components , indicates the distance between node and UAV; since the UAV flies at uniform velocity, this state can also represent the flight delay of the UAV to node . indicates the sum bits of all tasks in the cache of node n, is a time parameter that represents the most urgent task at node , which is given as follows: where denote the deadline and the value of the timer of th task, denotes the task set in the cache of node at time slot . Note that when , it means the th task needs to be transmitted to SN, so the th task will be not stored in the cache.
3.1.2. Action Space
In each step, the UAV needs to make an action to decide which node to provide computing services for. We define the action space as follows:
Each element denotes the number of a node.
3.1.3. Reward Function
The object of the optimization problem is to maximize the energy efficiency. From the previous analysis, we know that the energy efficiency of cooperative computing can be improved by offloading unfinished tasks to the UAV instead of SN. Therefore, UAV should be encouraged to help nodes complete urgent and bit-heavy tasks in time, and we define reward as follows:
where is the action of the UAV at time slot that represents the node served by the UAV. denotes the tasks which are offloaded to UAV by the node n in the duration of t to . describes the urgency degree of the task in (22). Furthermore, it is discouraged for nodes to offload tasks to SN, and we define the negative reward function as follows: where denotes the task set offloaded to SN of the node in the duration of to .
The reward function is the trade-off between and , and the formula is as follows:
In each step, UAV makes an action, and the environment automatically gives a reward. The total rewards are , and the goal of our DRL algorithm is to maximize the total rewards.
3.2. DQN Algorithm for UAV Path Planning
As shown in Figure 2, the DQN algorithm contains two deep neural networks: predicted Q network with the parameter , target Q network with the parameter . The two neural networks have the same structure. Every several training steps, the soft update method is adopted to update . The predicted Q network is used to estimate the value. The value is used to choose an optimal action, and a reward is returned based on this action. The target Q network is created to help DQN converge, and it is updated slowly using the same parameters from the predicted Q network. The pseudocode of the algorithm is shown in Algorithm 1. We define a training period as the life cycle of the UAV or the life cycle of a node, so episode_end_flag indicates that the UAV runs out of energy or any node runs out of energy. (In order to achieve energy consumption balance to prevent covering holes. We take the death of any node as one of the end flags of an episode of training, which can prevent a situation where a single node runs out of energy but the cumulative reward is larger.) At each step, the UAV first observes the current state , and the predicted Q network is responsible for selecting action based on according to the -greedy algorithm as follows:

Then, a record about is stored in the experience pool. When the experience pool is full, the newest records will replace the oldest records to be stored in the buffer. When updating the parameters of the predicted network, the classical gradient descent algorithm is used to minimize the loss function. The loss function is a minimum mean squared loss function. The formula of the loss function is given as where denotes the size of a batch. is the target value, and the formula for updating is as follows: where denotes the discount factor which indicates the trade-off between the future and the current reward. We adopt the Adam optimizer to minimize the loss function.
|
Furthermore, some specific details are discussed below. For the neural network, the dimension of the input layer of the Q network is , which describes the UAV state and node state in Equation (21), and is the number of nodes. The dimension of the output layer of the Q network is . The hidden layer of the Q network consists of fully connected layers. Meanwhile, we use layer normalization technology to standardize the input data in each neural network. The Relu function is used as the activation function for each neural network.
At the beginning of the training, the decision made by the agent is close to a random algorithm. With the continuous iteration of the algorithm, the reward for the path planning policy gradually rises and, finally, converges to the near-optimal algorithm. At the end of the learning iteration, the learned parameters of the DQN neural network are obtained.
4. Performance Evaluation and Simulation Results
In this section, we evaluate the performance of DQN-based path planning for UAV-assisted WSNs. The simulations are carried out on the Python platform, where the Pytorch module is used to build the neural network model, and the Gym module is used to complete the environment construction.
4.1. Simulation Environment Setting
For the simulation environment, the parameters used in the simulation environment are shown in Table 1, which refers to the setting of simulation experiments in [35, 38, 40]. We randomly deploy nodes in the monitored area, and the position coordinate of SN is 50 and 50. In a training episode, the nodes receive computation tasks in every unit of time with probability . Task parameter and will be discussed in the simulation results. The hidden layers of the predicted Q network and target Q network are four layers, and the number of neurons is 256, 128, 64, and 32, respectively. The other hyperparameters of neural networks are shown in Table 2.
4.2. Comparative Algorithms
To better evaluate the performance of the algorithm, there are other five methods compared with the proposed algorithm. (1)Traveling Salesman Problem (TSP) Algorithm. TSP is a classical path planning algorithm, which can find the shortest path that traverses all nodes once [42]. In each episode, The UAV first calculates the shortest path according to the TSP algorithm and traverses each node in turn until the end of the episode(2)Min_Deadline. In each step, the UAV selects the node that has the most urgent task; that means (3)Max_Data. In each step, the UAV selects the node with the largest number of bits of all tasks; that means (4)Max_V. In each step, the UAV selects actions based on a combination of task volume and task urgency; that means (5)Q-Learning. Q-learning is classic reinforcement learning algorithm [43], but it cannot deal with the problem of continuous state space, so we discretize the state space and select actions by training the Q table
4.3. Simulation Results
In this section, we compare the proposed algorithm with the performance of other algorithms at different parameter test points. In each parameter test point, we conducted 10 groups of comparative experiments. In the same group of comparison experiments, the performance of all algorithms is compared under the same environment and initial state. (1)The Convergence Performance of Proposed DQN. Figure 3 shows the convergence performance of the proposed algorithm over 8000 episodes. In each episode, the sensor nodes are randomly deployed, and the UAV takes off from a random location in the monitored area. The value of for each task is randomly given from 2 to 10 Mbit and the value of is randomly given from 10 to 40 s. It can be seen that the energy efficiency tends to converge after about 3000 episodes of training. Due to the existence of multiple time-varying environmental parameters, the algorithm still has a certain degree of fluctuation after convergence, but it means that the value is stable(2)Comparison for the Amount of Bits of the Offloaded Tasks. We prefer to let the UAV take on more computation tasks; so, we first compare the amount of bits of the offloaded task in various algorithm

Figure 4 shows the change situation of the amount of tasks completed by UAV in one episode. The -axis represents the sum bits of tasks completed by UAV, and the -axis represents the value of parameter per task. The value of the parameter for each task is randomly given from 10 to 40 s. With the increase of , the amount of tasks completed by UAV in the proposed DQN algorithm is significantly higher than in other algorithms. When the value of is small, the node itself can complete most of the tasks independently, so the tasks offloaded to the UAV are less. The salesman algorithm has the lowest performance, because it is static path planning, and UAV always traverses nodes according to the shortest path, which cannot adapt to the constantly changing state of nodes.

Figure 5 shows the comparison between the amount of data offloaded to the UAV and the amount offloaded to SN. The value of is randomly given from 2 to 5 Mbit, and . Similar to Figure 4, the amount of tasks offloaded to UAV in the proposed DQN algorithm is significantly higher than in other algorithms. Meanwhile, the amount of tasks offloaded to SN is the least of all. This means that the UAV can adapt to dynamically changing tasks and provide computing services to each node in a timely and efficient manner. (3)Comparison for the Energy Efficiency. Maximizing the energy efficiency of cooperative computing is the objective of the optimization problem

Figure 6 shows the change situation of energy efficiency as the amount of tasks increases. The -axis represents the energy efficiency of cooperative computing, and the -axis represents the value of parameter per task. The value of for each task is randomly given from 10 to 40 s. It is obvious that the proposed algorithm has the highest energy efficiency at each test point. At the same time, it can be seen that the performance of the three algorithms, Min_deadline, Max_data, and Max_V, is similar, which indicates that it is not enough to consider the single-step state only. We need to consider the cumulative rewards of multiple steps in multiple states.

Figure 7 shows the change situation of energy efficiency as the deadline of tasks increases. The -axis represents the energy efficiency of cooperative computing, and the -axis represents the value of the parameter per task. The value of for each task is randomly given from 2 to 5 Mbit. We can see that when the value of is small, which indicates that the task is urgent, and the energy efficiency of cooperative computing is low. This is because, limited by the flying speed and computing capacity, the UAV has no ability to participate in cooperative computing in time, so more tasks are offloaded to SN. As the value of increases, the UAV has sufficient time to participate in cooperative computing, and the amount of data offloaded to SN gradually decreases and approaches zero, so the energy efficiency gradually increases and becomes stable. (4)Comparison for Node Energy Balance. In WSNs, the energy consumption balance of nodes can prevent the rapid death of a single node and help to improve the network’s life cycle

Figure 8 shows the energy consumption of nodes in a period. The value of for each task is randomly given from 2 to 5 Mbit, and the value of is randomly given from 10 to 30 s. The energy consumption of local computing for a node is relatively stable, but if a node transmits a large amount of uncompleted tasks to SN, the energy consumption is huge. In the competitive algorithms, the proposed DQN algorithm makes the nodes have the minimum energy consumption and the most balanced energy consumption. This shows that the UAV can continuously assist the nodes to complete the calculation within instead of transmitting unfinished tasks to SN.

5. Conclusion
In this paper, in order to address the challenges of the huge computing demands and limited battery energy in complex WSN applications, the UAV cooperative computing mechanism is introduced into classical WSN architecture to establish a three-tier WSN computing model, in which the time-varying computation-sensitive tasks which cannot be processed by sensor nodes can be offloaded to UAV or SN. In order to adapt to the time-varying indeterminate environment, a DQN-based path planning algorithm is proposed to maximize the energy efficiency of cooperative computing. Simulation results show that compared to competitive algorithms, the proposed algorithm can significantly improve the energy efficiency of cooperative computing and achieve energy consumption balance.
This work discusses a new direction for the UAV-assisted cooperative computing in WSNs. There are many promising research opportunities along this direction. In particular, we start from one UAV in this work, which can be extended to multi-UAVs cooperative computing in the future. Multi-UAVs cooperative computing can provide more timely computing services and stronger computing capacity, but it also brings greater challenges to the collaboration and scheduling between UAVs. Therefore, new multiagent machine learning algorithms need to be designed to match more complex state space and action space.
Data Availability
The data used to support the findings of this study are available from the corresponding authors upon request.
Conflicts of Interest
The authors declare that they have no competitive interests.
Acknowledgments
This research was supported in part by the National Natural Science Foundation of China under grant 62061009, in part by the Fund of Key Laboratory of Cognitive Radio and Information Processing, Ministry of Education (Guilin University of Electronic Technology) under grant CRKL190110, and in part by the Basic Ability Improvement Project of Guangxi University young and middle-aged teachers (Scientific Research) (2020KY05029).