Abstract
The fifth-generation mobile communication technology is broadly characterised by extremely high data rate, low latency, massive network capacity, and ultrahigh reliability. However, owing to the explosive increase in mobile devices and data, it faces challenges, such as data traffic, high energy consumption, and communication delays. In this study, multiaccess edge computing (previously known as mobile edge computing) is investigated to reduce energy consumption and delay. The mathematical model of multidimensional variable programming is established by combining the offloading scheme and bandwidth allocation to ensure that the computing task of wireless devices (WDs) can be reasonably offloaded to an edge server. However, traditional analysis tools are limited by computational dimensions, which make it difficult to solve the problem efficiently, especially for large-scale WDs. In this study, a novel offloading algorithm known as energy-efficient deep learning-based offloading is proposed. The proposed algorithm uses a new type of deep learning model: multiple-parallel deep neural network. The generated offloading schemes are stored in shared memory, and the optimal scheme is generated by continuous training. Experiments show that the proposed algorithm can generate near-optimal offloading schemes efficiently and accurately.
1. Introduction
The rapid development of fifth-generation (5G) mobile communication technology services in recent times has prompted the emergence of compute-intensive applications, such as intelligent driving, ultra-high-definition video, and mobile crowdsensing [1]. The 5G technology is largely characterised by extremely high data rate, low latency, massive network capacity, and ultrahigh reliability; hence, it requires the appropriate architecture to function efficiently. However, traditional centralised cloud computing network architecture is unable to meet the requirements of the 5G software architecture because it is limited by excessive link load and delays in real-time response [2, 3]. Consequently, the European Telecommunications Standards Institute (ETSI) has proposed a new concept of mobile edge computing (MEC) [4–6]. Users can migrate compute-intensive and delay-sensitive applications from local to edge server to solve the problems of limited computing resources and battery energy [7]. Concurrently, the edge servers can precache some content required by users to reduce access delay and improve user experience [8]. Currently, the research methods of MEC are mainly divided into two categories: offloading method for reducing time delay and offloading method for reducing energy consumption. To reduce time delay, some researchers proposed the IHRA scheme for computations in multiuser situations [9]. It considered the rich computing resources of cloud computing and the low transmission delay characteristics of MEC. The offloading technique enabled part of the computing tasks to be offloaded to the user terminal device for execution, thereby reducing the execution delay of delay-sensitive applications by 30%. Some studies have proposed the LODCO algorithm [10], a dynamic computing offloading algorithm based on Lyapunov optimisation theory. This method optimises the unloading decision based on two aspects, task running delay and task running fault, to ensure the minimisation of processing delay in the offloading task and the success rate in the data transmission process. Consequently, the probability of offloading failure is reduced. The simulation results showed that the algorithm has excellent advantages in reducing time delay and could shorten the execution time of offloading tasks by 64%. However, this method only focused on the delay condition and failed to consider the energy consumption of the mobile terminal device during the offloading. As a result, the terminal equipment may not operate properly owing to insufficient power, which can have negative impact on the user experience. Therefore, further study is required to discover an offloading technique that can minimise energy consumption. Several studies have been conducted on the optimisation of energy consumption to solve offloading problem in different environment scenarios. Some studies have adopted the artificial fish swarm algorithm to design the offloading scheme for energy consumption optimisation under time delay constraints [11]. Although this method effectively reduces the energy consumption of the task data transmission network by considering the link status in the network, it is characterised by high complexities. In a previous study, a particle swarm task-scheduling algorithm was designed for multiresource matching to minimise energy consumption of edge terminal devices [12]. Furthermore, some studies have investigated partial offloading of computing tasks to minimise the energy consumption of mobile devices. For wireless devices (WDs) with a separable task [13], an energy-saving optimisation problem was proposed, and a greedy algorithm was used to solve it. However, this method only reduced the energy consumption of the mobile terminal and could not minimise delays in the task execution time.
To minimise the calculation delay and energy consumption simultaneously, especially in the environment of multiple MEC servers and multiple terminal users, it is appropriate to realise the offloading of the computational tasks of wireless mobile devices [14]. The use of software defined networks (SDN) for MEC has been adequately exploited to efficiently and effectively deal with the data offloading problem. Specifically, the SDN paradigm transforms the communication networks into a programmable world, where a centralised entity, namely, SDN controller, acquires a global view of the communication links and manages the network traffic efficiently and dynamically [15, 16]. However, the centralised control mode and openness of SDN can pose potential risk to the security of the controller.
Due to these limitations, it avoids the network risk induced by the centralised control mode, while considering the minimisation of computation delay and energy consumption. This study designs a deep learning offloading algorithm that focuses on reducing energy consumption and time delays. The algorithm is composed of two components, offloading scheme and deep learning, and aims to solve the problem of selective offloading of mobile application components. The innovation of the algorithm is mainly reflected in the following:(1)When we establish the system utility model in the MEC network, we mainly weight them based on the two operators of delay communication and residual energy and then obtain the cost of offloading.(2)The other innovation of this study is a new type of deep learning model, which uses multiple-parallel deep neural networks [17–19], stores several offloading schemes of computing tasks in a shared memory, and substitutes them into the new deep learning model. After repeated iterations, the optimal offloading scheme is finally obtained. Compared with the traditional single deep neural network, this novel deep learning model has the advantage of facilitating the optimal edge offloading scheme. Thus, the edge offloading convergence is greatly improved.
Experiments show that the scheme has higher accuracy, lower energy consumption, and lower communication delay. The MEC architecture based on the new deep learning model proposed in this study is shown in Figure 1.

2. System Model and Problem Formulation
2.1. System Model
In this study, an efficient mobile edge offloading framework is designed. Suppose that, in the MEC framework, only one edge server, one small cell, and N WDs exist, where N WDs are represented by a set N = {1, 2, …, N}. We assume that the computational task of each wireless terminal device contains C independent subtasks, which are recorded as A = {Application_1, Application_2, …, Application_c}. The work queue of these subtasks is stored in FIFO. For the subtasks in any WD, the offloading scheme is obtained through the offloading algorithm designed in this study. The offloading scheme is expressed in binary; that is, Pt ∈ {0, 1}. Pt = 0 indicates that WD subtasks are executed locally, and Pt = 1 indicates that WD subtasks are offloaded to the edge server for execution. P is the offloading scheme of all subtasks in WD, which is recorded as
The edge computing model of the WDs network is shown in Figure 2.

2.2. Local Execution Model
Assuming that there is a subtask t in WD, to solve the cost function of its local execution, the computational delay and energy consumption must be first determined. Afterwards, the two following operators must be used to solve it. The data size of any WD subtask is marked as dnt. Wt represents the computing resources occupied by subtask t, V represents the number of clocks occupied by the CPU to execute one byte, and fu represents the CPU operating frequency of the WD [20]. Therefore, the computational delay is
Wt is determined by dnt and V, which denote the data size of the subtask and the number of clocks occupied by the CPU executing one byte, respectively.
During local execution, the energy consumed by executing each byte is recorded as el, and the energy consumed by executing subtasks is obtained using the following equation:
The cost function of local execution is obtained from the computational delay and energy consumption of local execution, which is recorded as Cl(s):
In equation (5), γ1 and γ2 are weighting coefficients, which are linear with the maximum execution time (Tmax) and maximum energy consumption (Emax) in the task, respectively.
2.3. Mobile Edge Computing
To solve the problems of limited computing resources and battery energy, compute-intensive and delay-sensitive computing tasks in WDs are offloaded locally to the edge server for execution to improve the efficiency of the system. Similar to local execution, computational delay (Tr(c)) and energy consumption () are crucial operators that affect edge offloading. This study considers the impact of computing resources and communication resources on Tr(c) and . When the influence of all variables is fully estimated, the most reasonable cost function can be formulated and the optimal offloading scheme can be obtained. The process of offloading the subtasks in WDs to MES involves the following steps. First, the data is offloaded to MES. Second, the task is executed in MES. Subsequently, the third step requires a downlink to a WD, and the last step is to complete decoding in WD.
Consequently, four time constants are generated: the upload time of the subtask, Tup; the execution time of the subtask executing the task in the MES, Tex; the time of the subtask down to the WD, Tdown; and the decoding time, Td.
Equation (6) shows that, in the MES, the allocation of communication resources, namely, bandwidth, directly affects Tup and Tdown of the subtask, and the allocation of computational resources, that is, the allocated CPU, directly affects Tex of the subtask. They are denoted by the following equations:
In equation (7), rul and rdl represent the transmission rates of the subtask from uplink to MES and downlink to WD, respectively. These two factors are closely related to the allocation of communication resources. Wt represents the number of clocks required to execute subtask T, M represents the number of CPUs allocated to the subtask (the number of M depends on the data structure of the subtask), and fs represents the CPU working frequency of MES. Bandwidth allocation directly affects the uplink time and downlink time of subtasks [21, 22].
The effects of bandwidth and number of CPUs on the computational delay and transmission delay in offloading are shown in Figure 3.

(a)

(b)
In this study, emphasis is placed on the method for solving rul and rdl. The orthogonal frequency division multiple access (OFDMA) technology is used to complete the allocation of communication resources. This technology divides the channel into several orthogonal subchannels and converts the high-speed data signal into parallel low-speed subchannels. Because the orthogonal signals can be separated using correlation technology at the receiving end, it reduces the mutual interference among subchannels and completes the allocation of communication resources. The total network bandwidth B is decomposed into k subcarriers (K ∈ 1, 2, 3, …, K), and each subtask t is allocated to several subcarriers. In signal processing, considering that the additive Gaussian white noise signal (AWGN) is easy to analyse and approximate, the actual noise signal can be approximately processed with Gaussian white noise in a certain frequency band for the analysis of the noise performance of the signal processing system [23, 24]. Therefore, the maximum data transmission rates of uplink and downlink are
Here, B is the network bandwidth, D is the distance from WD to MES, N0 is the power consumption of noise, Pu is the power consumption generated by WD transmission data, and hul is the attenuation coefficient of uplink channel. β is the path loss index, is the bit error rate of uplink data transmission, and represents the stability margin of signal-to-noise ratio, which aims to meet the target of bit error rate.
Similar to the analysis method for calculating the delay, the energy consumption generated by offloading the subtasks to the MES is mainly composed of the energy consumption of the MES execution task and the energy consumption of the decoding in the WD. The energy consumption generated by the data uplink and downlink is ignored.
The cost function of remote offloading is obtained by computational delay and energy consumption of edge offloading, which is denoted as Cr(s):
In equation (10), γ3 and γ4 are linearly related to Tmax and Emax, respectively. TD represents the maximum execution time of the subtask in the MES, and EM represents the maximum energy consumption in the task.
2.4. Cost Function
WDs require a certain amount of time and energy to perform computing tasks. In this paper, a system utility model S is introduced (equation (11) and the delay and energy are minimized by analyzing the model. The model contains four key parameters that affect the utility of the system: the number of bytes D of the subtask, the allocated communication resources, K subcarriers, and the allocated computing resources-m CPUs. The energy consumption Ent is employed to execute the task.
Based on the established system utility model and the cost analysis of local offloading and edge offloading, the cost function of performing computing tasks is obtained as
In equation (12), Pt represents the binary offloading scheme of each subtask in WDs. Pt = 0 indicates that the subtask of WD is executed locally, and Pt = 1 indicates that the subtask of WD is to be offloaded to the edge server for execution.
2.5. Problem Formulation
This study uses the system model S and the cost function to find the optimal offloading scheme Po, which minimises the cost of executing tasks and obtains the optimal solution for MEC.
To obtain the optimal scheme, we input the system utility parameters of the computing task into the DNN [25]. After the identification of the hidden layer, we obtain the offloading scheme at the output layer. The offloading scheme passes through the back-propagation and iteration of the neural network [26], namely, gradient descent algorithm. We obtain the optimisation scheme Po that is closest to the actual value. The mathematical expression is as follows:
In equation (13), represents the total cost of performing the calculation task and P represents the most optimal offloading scheme. We use the DNN to obtain the most approximate value Po, making Po infinitely close to P.
3. EDLO Algorithm Design
DNN has a high accuracy rate when training big data [27]. The second aim of this paper is to use multiple-parallel deep neural networks to train samples and obtain the optimal offloading scheme [28, 29]. The input layer of the model is the vector state of several subtasks in the computing task.
The output layer is the unloading plan P. This study divides the computing task of WD into several subtasks. The scheme for executing the task is denoted as P = Pt, and c = 1, 2, 3, …, |k|. Each subtask has two possibilities of offloading; that is, it can be executed locally (Pt = 0) or using MES (Pt = 1). Therefore, they are offloading schemes for t subtasks.
The detailed steps of energy-efficient deep learning-based offloading (EDLO) designed in this study are listed in Table 1.
3.1. EDLO Algorithm Analysis
The multiple-parallel deep neural network designed in this study increases the number of hidden layers and neuron sums in the network. The number of hidden layers is set to 2, and the number of neurons in each layer is 256 [35]. Based on the corresponding numerical analysis, it was observed that an increase in the number of hidden layers and neurons prompted a corresponding increase in computation accuracy [30]. The sigmoid function is used as the activation function and the value range, which is used for the output of hidden layer neurons, is [0, 1]. Finally, the cross entropy loss function is used as the loss function [31]. These matchings are made because the use of the sigmoid function can avoid the loss of mean square error during gradient descent [32, 33], where learning rate is reduced. In other words, the learning rate can be controlled by the output error. The learning process of the DNN is shown in Figure 4.

4. Experiment and Comparative Analysis
4.1. Experiment
In this study, the EDLO algorithm is designed for the optimal offloading scheme problem. The following experiments and simulations were performed to verify the performance of the algorithm. We assume that the edge offloading framework includes three WDs and the computing task of each WD consists of five subtasks, which can be executed by either the edge server or the wireless terminal. The data byte size of each subtask is randomly distributed within 10 M to 20 M [34]; that is, D ∈ [10 M, 20 M]. The number of clocks occupied by CPU executing one byte is given as V ∈ [2000, 10000]. The available energy Et in the WD is 1500 J [35]. The energy consumption of transmitting and receiving data by WDs is 11.3 × 106 J/bit [36]. The number of CPUs in the MES is denoted as m ∈ [0,10]. The number of subcarriers is denoted as n ∈ [1, 512].
In Figure 5, the number of DDNs is different, which affects the convergence of the EDLO algorithm. Because of the input of different state space S, the evaluation function of the system produces different output results. This study uses the EDLO algorithm as the benchmark to ensure that the computation standard of the input quantity is unified. As shown in Figure 5, as the learning steps increase, the EDLO algorithm gradually converges to one. When using ten DNNs, the convergence rate is approximately 98% after 1000 learning steps. When using one DNN, the computation accuracy rate is 75% and cannot converge. When using three DNNs, the computation accuracy rate is approximately 97%. Specifically, the computation accuracy rate is increased by 21% compared to when one DNN is used, and the algorithm converges. As the number of DNNs increases, the convergence speed increases accordingly. The influence of different DNNs on the computation convergence speed is shown in Figure 5.

The EDLO algorithm can generate an optimal unloading decision in less than 0.2 s. Its computational time is a close approximation of the different numbers of DNNs. The computational delay statistics of different numbers of DNNs are shown in Figure 6.

Different learning rates and different data storage spaces have different effects on the convergence speed of EDLO algorithm. The higher the set learning rate is, the faster the convergence speed of EDLO algorithm is. However, as shown in Figure 7, an increase in the learning rate increases the probability of obtaining the optimal solution locally as compared to the probability of obtaining the entire optimal solution. Therefore, it is appropriate to choose the best learning rate according to the actual situation.

Figure 8 shows that different memory spaces affect the convergence speed of the EDLO algorithm. Based on the experiments conducted, it was observed that local convergence is often the first to be established. Therefore, to balance the convergence speed and overall performance of the algorithm in the MES network, we use 2048 memory space to achieve the fastest convergence speed. Figure 8 depicts the effect of the size of memory space on the convergence speed of the algorithm.

4.2. Comparative Analysis
To accurately evaluate the performance of the EDLO algorithm, this study compares the performance of the EDLO algorithm with the processing schemes of four computing tasks. Ten DNNs are used in the EDLO algorithm, the learning rate is defined as 0.01, and the system memory is 2048. Consequently, the EDLO algorithm achieves optimal performance. The four schemes used for comparison are as follows:(1)Total local process (TLP): all computing tasks of WD are performed locally.(2)Total edge process (TEP): all computing tasks of WDs are executed by the edge server MES [37, 38].(3)Random offloading scheme (ROS): regardless of the state space S of the computing task, the subtasks are randomly arranged on the edge server or local WDS for processing.(4)Deep learning-based offloading (DLO) scheme: the input state of the algorithm does not include the remaining energy of the WDS and the energy consumption of executing the program [39]. The neural network of the algorithm contains two hidden layers, each with 128 neurons.
In this paper, the EDLO algorithm and the other four algorithms are compared and tested based on the following: (1) the impact of communication resource on offloading delay, that is, the impact of bandwidth B on Tud = Tup + Tdown; (2) the impact of computing resources on edge computing latency, that is, the impact of the number of CPUs m on Tex; (3) the total computational delay generated during task execution, that is, the total delay consumption of task execution; (4) computational accuracy; (5) energy consumption; and (6) computational cost. Based on the results, it was proved that the EDLO algorithm provides optimal performance. Network parameters in this study are listed in Table 2.
Owing to the TLP, the computing and communication resources of MES are not used for local data processing. In this study, when the network bandwidth is 100 Mbps, the number of cores in MES is 10 and the computing task size is 15 M. The effects of the four algorithms on communication and computational delays are compared with those of the EDLO algorithm. The communication and computational delays of EDLO algorithm are 93 ms and 102 ms, respectively. The delay time of EDLO algorithm is substantially lower than those of the TLP and ROS algorithms and moderately lower than that of the traditional deep learning algorithm, DLO. Compared to the DLO, the communication delay time and computational delay time of EDLO are 30.76% and 31.58% lower, respectively. Thus, the computational efficiency of the proposed algorithm is significantly improved because we consider the impact of bandwidth and CPU on delay. The communication delay Tud and computational delay Tex of all the algorithms in the offloading process are shown, respectively, in Figures 9 and 10.


Figure 11 presents a comparison of the total computational delay of EDLO and other algorithms. Based on the experimental results, the total computational delay of EDLO is 860 ms, which is 44.19% lower than that of the DLO algorithm. Thus, the real-time performance of computation is improved greatly.

Figure 12 shows a comparison of the offloading accuracy of EDLO and other algorithms. Based on the results, the offloading accuracy of EDLO is significantly higher than those of TIP and TEP and slightly higher than that of DLO. However, the offloading accuracy of the ROS algorithm is approximately zero. Similar to Figure 11, Table 3 shows that the EDLO algorithm is reliable.

Figure 13 shows the comparison of energy consumption of different algorithms. The energy consumption generated by EDLO is lower than that generated by DLO, which is 9.82% and significantly lower than those of the other three traditional data processing methods.

Figure 14 presents a comparison of cost of EDLO and the other algorithms. Based on the experimental results, the final cost of the EDLO algorithm is 38.32%, which is lower than that of the DLO and significantly lower than those of the other three traditional data processing methods.

It can be observed that the EDLO Algorithm exhibits superior performance compared to the three schemes of TLP, TEP, and ROS. The performance of the EDLO algorithm is higher than that of the DLO algorithm.
5. Conclusion
In this study, we propose a deep learning offloading algorithm, EDLO, to reduce energy consumption and time delay. To realise this, we consider the energy consumption, computing delay, computing resources, and communication resources of the applied system. Distributed management avoids the hidden dangers of network security caused by centralised control. The algorithm introduces multiple-parallel DNNs, which can generate the optimal solution without manually labelling the data, and the numerical results verify the accuracy and performance of the algorithm. We envisage that our proposed framework can be applied in subsequent advances of MEC network to achieve optimised real-time offloading.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.
Authors’ Contributions
Xiaoliang Cheng and Jingchun Liu have contributed equally to this work and share first authorship.
Acknowledgments
This study was supported by the National Natural Science Foundation of China (81601467, 81601472, and 81871327).