Abstract

The quality of service (QoS) in 5G/6G communication enormously depends upon the mobility and agility of the network architecture. An increase in the possible uses of 5G vehicular network simultaneously expands the scope of the network’s quality of service (QoS). To this end, a safety-critical real-time system has become one of the most demanding criteria for the vehicular network. Although different mathematical and computation methods have traditionally been used to optimize the allocation of resources, but the nonconvexity of optimization issues creates unique type of challenges. In recent years, machine learning (ML) has emerged as a valuable tool for dealing with computational complexity that involves large amounts of data in heterogeneous vehicular networks. By using optimization and cutting-edge machine learning techniques, this article gives readers an insight about how 5G vehicular network resources can be allocated to reinforce network communication. Furthermore, a new federated deep reinforcement learning- (FDRL-) based vehicle communication method is presented as a new insight. Finally, a UAV-aided vehicular communication system based on FDRL-based UAVs is proposed as a novel resource management technique to optimize 5G and 6G quality of services.

1. Introduction

Nowadays, it is imperative to develop a robust 5G new radio (NR) system [1] because of exponential increase in cellular mobile devices and automobiles. On one hand, people’s lives are improved in a variety of ways due to wide range of applications, but on the other hand, the required quality of services (QoS) also need to be ensured. In this regard, optimization of resources like computing power, sum-rate maximization, and delay minimization has been the focus of optimization problem formulations [2, 3]. Meanwhile, simple convex optimization also suffices as one of the basic scenarios to fulfil these aims. It is observed that wireless resource management issues tend to be nonconvex and polynomial, thus creating unique type of challenges. Due to complexity of mathematical calculations, it is difficult to find algorithms that are effective or powerful enough to reach suboptimal locations. Although the vehicular network increases the range of new services and mobility options, still producing a massive volume of data that is difficult to comprehend. To address these problems, new and more powerful methods of calculation are required. In addition, the DRL algorithm may be used without sharing the vehicle’s dataset via federated deep reinforcement learning, besides eliminating the delay issues. As a flying BS in vehicle networks, drones are utilized to ensure that all vehicles are continually connected. An FL technique, specifically a UAV-aided vehicular network proposed FDRL approach, to improve connection and minimize latency is also being studied. Proposals have also been made for an FDRL-based vehicle communication system. As a flying BS in-vehicle network, drones are utilized to ensure that all vehicles are continually connected. All vehicle-to-vehicle, infrastructure (V2I), and other interconnections like 5G heterogeneous vehicle networks are included in this concept (V2X) as shown in Figure 1.

Vehicle-to-vehicle (V2V) communication (SRC) channels are known as DSRCs (dedicated short-range communication). It is possible to expand the variety of services available to VUEs by using macro-BSs and RSUs in conjunction with a cellular vehicular network. In space, satellite communications and air-to-air communications commonly take place in which exchange of data and information is carried out. For UAVs to communicate with each other, they must fly lower in the sky. Communicating with planes and with the ground is the primary function of these unmanned aerial vehicles (UAVs). For heterogeneous 5G and 6G vehicle networks, the requirements for quality of service (QoS) have increased. For ultrareliable and low-latency connectivity, 5G NR supports a wide variety of new QoS criteria. Data sent and received by machines is referred to as massive machine-type communications (mMTC) and mobile broadband (MBB). Similarly, in case of dependability, the URLLC service requires an end-to-end (E2E) latency of one millisecond (ms) and can support up to one million devices per square metre (km2).

2. Historical Background

A machine learning technique known as “deep reinforcement learning” (DRL) is used to train computers to learn, in which reinforcement learning (RL) and deep learning (DL) are part of it (DL).

2.1. Reinforcement Learning

Sequential decision-making can be addressed by limiting the reward when dealing with an unfamiliar environment. Because it does not require many datasets to train, the method is well-suited for use in 5G and 6G vehicle networks, which have more dynamic environments [4, 33]. In this regard, an agent is a person or organisation that performs a task for compensation. Consequently, the agent’s activities take place in the physical world. Whenever an agent interacts with the environment, it is presented with a representation of the environment’s current state. In this way, a list of activities is selected by an agent. After completing the task, the agent is given a prize. -learning is a popular algorithm in the field of reinforcement learning. Kisacanin has highlighted the way to calculate the reward value, which is , while the learner’s rate is one, and the discount factor is also one [5]. The letter “r” denotes the award.

2.2. Extensive Education and Training

Deep learning (DL) is based on artificial neural networks (ANNs), which are also known as “deep networks” (ANN); a completely linked deep network is shown in Figure 2.

Neuron cells in the deeper layers of a densely coupled network are known as LSTM cells in this paradigm. Deep -learning makes use of DL networks (e.g., RNN models) to estimate the value. Inputs to a DL network can include, for example, the states’ and values of all potential actions. Fully connected neural networks (FCNNs) are artificial neural networks whose architecture connects all nodes (neurons) in one layer. CNNs are trained to find and extract the best characteristics from photos. Their primary asset and the classifier strength of a CNN’s last layers connect them all. As CNNs integrate FC layers, these two topologies are not competitors. Unconnected convolutional layers are substantially more specialised and efficient. Fully linked layers have connections to all preceding layers, and each connection has its own weight. It is just feed forward neural networks. Fully connected layers are the network’s final layers. The final pooling or convolutional layer output is flattened and fed into the fully connected layer.

3. QoS Requirements in 5G and 6G Mobile Networks

3.1. Service Excellence Requirements

A 5G vehicle network large MMTC (machine-to-machine communications) is one of the three categories that 5G is planned to include, along with ultrareliable, low-latency connectivity (URLLC). The V2X application scenario is defined in the new 5G V2X service standard. Some of the more advanced uses include vehicle platooning and remote driving. The 5G network is built on top of the 4G network. In information-centric networks, the notion of a packet data unit (PDU) session was born.

Each PDU session has many Qualities of Service (QoS) flows. The granularity of a PDU session’s QoS distinction is described here. In most situations, QoS metrics are specified by a set of parameters like PER (percent) and GMB (kps). PER stands for the percentage of packets that fail to arrive at their destination. Vehicle platooning services must meet E2E latency norms of 10 ms and be 99.99% reliable. Advanced driving services necessitate larger bandwidth and reduced E2E latency, whereas ordinary driving services necessitate higher dependability and a bit rate of up to one gigabit per second. Figure 3 shows the model that is an operation on edge computing, and fog computing is also playing a very important role in the process of calculation of fog in the cars.

The physical infrastructure is identical to that used by the MVNOs (mobile network providers) (MNO). Queue length distributions are modelled using the extreme value theory (EVT). A method known as the maximal likelihood estimate (MLE) is employed to ascertain this information. To reduce signalling overheads, a distributed FL is utilized. Transmissions at an optimal power level remove the backlog to improve vehicle-to-vehicle communication system that lowers signalling costs while maintaining high reliability and low latency [6]. The distribution of resources in area of information (AoI) is discussed. If the volume of data grows too large, it might create an issue of information piracy. This research examines the trade-off between growing network knowledge and lowering AoI over a particular threshold. To predict the future of AoI, a Gaussian process regression (GPR) is applied. It is observed that unsupervised learning performs well in the dynamic vehicular network, which is a good thing, but the training data for ML models is challenging to collect especially in dynamic situations.

A DRL approach represents optimal resource allocation, which is used to provide safe and secure vehicle communication. These challenges are interwoven with vehicle network spectrum and computation power allocation challenges. Then, the optimum solution is found using a combination of single-agent and multiagent RL. Low-latency communication might be problematic due to high latency and security and privacy problems. Using a fog computing network helps lower the latency of cloud computing networks for cars.

Fog computing is a subset of fog networking. There are fog servers that can execute calculations and store resources in place of cloud servers. It provides service to a wide range of different and scattered devices. Fading computer networks are seen in Figure 4. The other option is to connect RSUs and BSs to fog computing servers through wired transmission. Fog servers are more convenient for end users than cloud servers (e.g., VUEs). When compared to cloud computing, this speeds up the transmission of data. This strategy reduces the time it takes to respond to end VUEs, especially during busy periods [7].

3.2. Theories and Applications of Optimization

A combined optimization examines user association, radio resource allocation, and power consumption. It describes a cloud and fog network cross-computing layer as a technique of assigning computing resources to clouds and fog. It controls traffic signal and traffic management on a global scale. Contract-based incentives and matching-based computation work assignments [8] will also be implemented. Fog computing vehicle networks may be established without orthogonal multiple access, or NOMA, according to the proposed design [9, 10]. There is also the possibility of using RL to address the issue of user mobility. Two methods used to optimize the subchannel and power allocation: CRO and RCCRO (real-coded chemical-reaction optimization). Researchers integrate user association with resource allocation [11]. The joint optimization problem is solved using a mixed-integer nonlinear algorithm. The Perron-Frobenius theory helps minimize transmission delays. The mentioned study also incorporates resource allocation and distributed computation offloading to allow vehicle networks [12]. Joint optimization is a nonconvex and NP-hard issue that might be solved by outsourcing computing tasks to dispersed computers and allocating resources accordingly (CCORAO). As a result, both the communication time and the utility of the system are improved [13]. The DNN method, on the other hand, is limited to short-term predictions of traffic flow. Having additional information about traffic patterns helps the network system to improve distribution of resources. The LSTM algorithm has been used to develop a time-series traffic flow prediction technique [14]. The LSTM may be used to depict both short-term and long-term traffic flow projections. As a result, gathering of data with a purpose to train the ML model is a huge undertaking. Because missing of data makes it impossible for the machine learning system to accurately anticipate traffic flow. As a result, resources are being preallocated incorrectly. For both autos and network infrastructure, radio resources are ineffective.

For traffic flow prediction with lacking data, an LSTM approach is proposed in this study. The missing data is dealt with using multiscale temporal smoothing. It uses an LSTM-DNN algorithm [15], which predicts traffic flow and parking conditions. The data on traffic has been used to allocate resources for vehicular fog communication throughout the short- and long-term future (VFC). To allocate spectrum across automobiles, RSUs analyse the forecasted data and utilize it as a guide. Data transmission and computation times are reduced by this suggestion. As stated in the study, the RL-based radio resource allocation algorithm proposed in the study takes the network’s future state into consideration as specified in the study [16]. This choice is also influenced by future network circumstances. The agent’s compensation is maximized based on predicted outcomes. In terms of throughput, the results are better than the vehicle network. Packet loss has also been reduced to a minimum. As previously stated, learning methods can be used for both supervised and unsupervised learning. Various ways can be used to reduce latency. Obtaining training data in a diverse vehicular network is quite tough. Because of this, DRL is being used to deal with this problem. An algorithm called DRL has been developed to address this issue.

The optional QoS level measurement equals the number of mapped 5QIs. This statistic indicates the proportion of unconstrained DL data volume for UEs in the cell, i.e., when all data can be transported in one slot and no UE throughput sample can be determined. To calculate the UE data volume, multiply the number of primary carriers by the number of supplementary carriers. The measurement can be subdivided by QoS level. Wireless transmission bandwidth and predicted vehicle power contribution are explored in depth in the work [17]. Markov decision process (MDP) models demonstrate the incorporation of processing and storage capacity in a comparable situation [18]. Perception-reaction time refers to the length of time it takes for a driver to react in a safe manner. This research examines the integration of fog resource virtualization (FRV) with information-centric networking (ICN). Deep neural networks are employed in combination with an actor-critic (A3C) (DNN) to maximize the utilization of computing resources.

Similarly, fog node is helpful for mobile customers, it supports different operations in varied senarios [19]. In the current world, a car’s mobility is a significant feature. Choosing the optimal fog node for clients is an important consideration. This study offers an effective allocation of resources to cars so that they can better serve their customers. Once the problem is solved, the nondominated sorting genetic algorithm is applied. MDP was first proposed as a tool for making resource decisions [20]. In order to better understand fog computing, researchers are looking at SDV-F (software-defined vehicular-based fog computing). Consequently, a method known as DRL is used to shorten the amount of time it takes for fog servers to accomplish operations. BSs employ a wide range of mission-oriented strategies. Each BS has its own edge computing server. The vehicle network may benefit from edge computing since it is both long-term and cost-effective. To maximize fog layer processing capacity, this is done. Consequently, according to the article, a method known as DRL is used to shorten the amount of time it takes for fog servers to accomplish operations. BSs employ a wide range of mission-oriented strategies. Each BS has its own edge computing server. The vehicle network may benefit from edge computing since it is both long-term and cost-effective.

3.3. Resource Allocation

An adaptive and online resource allocation has been created to improve the user experience [21]. Communication loss can be reduced in a vehicular edge computing network. An examination of radio and computer resources has been initiated by the discovery of unknown network statuses. A mobility-aware greedy algorithm has also been studied [22].

These methods are effective in reducing latency and maximizing energy efficiency. Nonconvex and NP-hard optimization problems, on the other hand, are extremely challenging to solve [23]. Then, it is a real challenge to decipher them. This challenge was solved using a machine learning method. In this way, the complexity of a nonlocal computer system is minimized. If the QoS criterion in a vehicle network is not met, a distributed user association algorithm is being evaluated. By allocating radio resources intelligently, the network load may be balanced while latency is decreased. Furthermore, two game theories [24] were used to test the load balancing scheme’s effectiveness. Within the limits of maximum allowable delays, the idea of reducing the processing time of vehicles is examined. An SDN-based task offloading system for FiWi (fiber-wireless interconnect) approaches is then developed. As a result, network performance is improved while latency is kept to a minimum. The radio resource management challenge for the 5G vehicle network is developed with an age of information (AoI) awareness [25]. An LSTM and a DRL are used to conduct online decentralised testing at the VUE pairings. It gives RSUs the ability to allocate bandwidth and make decisions about packet scheduling. Even though just a portion of the network’s state can be viewed, this method nonetheless manages to maximize the efficient use of available resources without requiring any prior knowledge of the network’s dynamics. The DNN model incorporates a convolutional neural network (CNN) [26]. A rough approximation of the offloading scheduling strategy and value function is made using this technique. A DRL was then added.

DRLOSM’s goal is to reduce energy consumption while also maximizing the number of retransmitted activities and costs. Researchers are examining the ADMM, or alternate-direction method of multipliers. The algorithm is dispersed. Content caching and computation are made possible by information-centric heterogeneous network infrastructure. Users with diverse virtual services can share communication, processing, and caching resources on the intended network system. Also, in the work of [27], mobility-based approach VEC servers are used to do conscious task offloading.

Entry points are being investigated. But, when a server is overloaded, a second server can be assigned the overloading duty. In this manner, the processing and computing delays are decreased, while the vehicle’s performance is improved. However, due to difficulties in obtaining the training data set, the DRL technique has been adopted.

Figure 5 shows the comparison of centralised and federated learning, in which a server-side machine learning method is employed in a distributed model. Data from VUEs is first processed and analysed by servers. It is found that machine learning methods are used at every level in distributed federated systems. VUEs also use server and ML techniques, where the only information sent to the server is information specific to the VUEs’ local ML algorithms. As a result, privacy may be assured. A distributed machine learning model reduces latency and improves accuracy. On the other hand, FL is a distributed machine learning approach in which a shared model is trained by several vehicles. Instead of transmitting all the raw data to the central server, the vehicles just communicate the updated parameters of the common model to the central server using their own local data. This approach was used to reduce congestion in the transportation sector, where UAVs are being deployed [28]. In order to acquire information about their surroundings, imaginary automobiles are equipped with cameras and GPS systems. RSUs [29] receive the sensing data from the vehicles and relay it to the servers. The vehicle network can survive jamming attempts due to a hill-climbing UAV relay device. In this way, the utility of vehicle communication is increased by lowering the bit error rate. Moreover, energy-conscious dynamic power optimization for each vehicle’s energy usage has been developed [30], in which the optimal dynamic power is found by examining vehicle collaboration and noncooperation while maintaining the privacy of the vehicle’s information based on FL techniques [31, 32].

4. Methodology

Despite a lack of resources, the usage of multiaccess edge computing and software-based network services is developed to increase the diversity of traffic patterns using this strategy. 5G and 6G mobile network QoS standards are diverse and need further research to ensure that they can be met.

1. The main server, at the start of the decision period , set the global DRL model Qf to a random value of
2. Vehicles owned and operated by the local community, local DRL models and should be initialised to a value of for all of the models
3. Obtain a copy of f0 from the central server and set to a value between 0 and 1
4. Initiate D’s replay memory, in each decision period to , perform the following:
the FLZ function calculation, vehicles owned and operated by the public:
do while
5. for each car in parallel, perform the following:
Get ft from the controller.
In this case, let
6. On the present service requests Qnt, train the DRL agent locally using nnt
7. upload the weights that have been trained to the central server
8. Receive all weight updates, not just the most recent onesexecute federated averaging for this step.
Broadcast weights averages
9. Until then, we are out of time
4.1. Challenges, Unanswered Questions, and Future Directions

SDN (switch function virtualization) and network function virtualization (NFV) are two terms used interchangeably. In 5G NR, there is a large-scale and diverse vehicle ad hoc network. Because of these features, ML algorithms cannot be used successfully. Network slicing and software-defined networking (SDN) have lately been proposed as solutions for the 5G automobile network. All kinds of various QoS services and heterogeneous networks are not an issue for this programme. With technologies such as SDN and NFV, it is possible to meet the QoS requirements of 5G NR, which depicts a multiaccess edge computing solution that addresses the demand for processing capacity, resource allocation, and storage capacity. A wide range of quality of service (QoS) demands may be met by the 5G vehicle network [31]. The 6G vehicle network features ultralow latency and high data transfer speeds, as well.

4.1.1. Unmanned Aerial Vehicle Assistive Vehicle Cargo Network

In today’s more complex automotive environment and computing requirements, mathematical optimization methods have been around for a long time are not up to the task. For machine learning models, obtaining training data is a huge challenge because the vehicular network is always evolving. In the absence of data, a DRL approach and training are required. The use of a DRL algorithm in the local training models of the end vehicles of 5G and 6G automotive networks is regarded as a potential option for reducing latency and enhancing privacy needs.

4.1.2. Caching and FL Communication Technique

End-user automobiles benefit from reduced computation and processing time due to MEC servers that use relevant FL techniques. BSs and RSUs have been discussed. MEC servers and eventually automobiles use DRL algorithms that have been trained to perform a specific task. Communication, processing, and caching strategies of the FL model must be thoroughly examined to increase network efficiency while preserving the heterogeneous QoS standards of the FL model.

The third aspect is the ability to share information with others. Cars, RSUs, BSs, drones, edge servers, and so on are all part of the vehicular network. Using the FL methodology, an effective resource allocation method for these heterogeneous devices must be investigated for 5G and 6G vehicle networks. One of the most important characteristics of the vehicle network is its high mobility. The activities of unmanned aerial vehicles (UAVs) have a significant impact on their effectiveness. By using the FDRL protocol, it can be possible to circumvent the problem of UAV servers requiring different kind of data from other vehicles having limited resources. Self-learning and reporting back to central servers, such as MEC servers that are placed on UAVs, is possible while utilizing this technology. By allocating radio frequencies, the UAVs, as flying BSs, are obligated to give VUEs the bandwidth they need. As a result, the employment of unmanned aerial vehicles (UAVs) is essential. A method that concentrates on the core area may cause latency. With an FDRL strategy, VUEs will always have access to enough spectrum resources. To construct an accurate prediction model, UAVs can collaborate with each other and use data from previous spectrum allocations.

Table 1 indicates that FDRL algorithm performs better than theoretically possible lower constraints on optimality. Two distinct limits can be employed, in which it is assumed that each UFB can support a set of MTC devices with a total utilization of at most 1, while the minimum bandwidth required for allocation of the MTC devices is defined as the inverse of its period; i.e., and the inverse of its jitter , respectively, to ensure that the MTC devices are allocated in an equitable manner. There are two effective bounds for implicit deadlines and synchronous device situations. These theoretical limitations represent lower bounds on the optimality of the associated situations; therefore, it is important to keep this in mind. The results of simulations are shown for a range of cluster sizes, from 11 to 22. The proposed model represents the iterative convex optimization technique with low complexity, and maximum energy efficiency algorithm shows the strategy which may maximum energy efficiency [26]. The energy efficiency is defined as the ratio of total sum rate to overall utilized power of all D2D connections.

Table 2 shows the correlation between the objective function value and the bit rate requirement. An increase in D2D link bit rate simultaneously increases the objective function values from 14.1227 to 33.2101. For federated deep reinforcement learning model, data ratios vary from 48.25 to 51.68, and admission gain increases 11.25 to 36.35 that shows the good performance. Unmanned aerial vehicle capacity UAV trajectory planning must be appropriately established due to the battery’s limited computing, storage, processing, and energy capabilities. The mobility and energy of UAVs must be shared in order to maximize resources for all VUEs in air-to-air communication (i.e., UAV-to-UAV communication). Because of privacy issues, a decentralised learning technique like FDRL may be used to learn about local energy consumption and estimate future demand. Using this method, UAVs may choose their own path.

4.1.3. Assistive UAV-Based Vehicular Network

MEC servers use macro-BSs, RSUs, and UAVs to reduce the amount of time it takes to do computation and processing tasks. There are five specific conditions that must be met when using a distributed FDRL for vehicular communication. For a variety of technological reasons, edge devices (e.g., VUEs) cannot send data to the cloud. Another aspect is that the training model must be fast enough, since the global model and its local models must often swap parameters (e.g., in VUEs). Because of this, it is imperative that all the models can communicate with each other in a timely manner. Data from edge devices must be labelled fast and accurate on the same machines. Similarly, to train their local data models efficiently, edge devices must have enough processing power and storage capacity to handle the workload.

5. Conclusions

The research is aimed at examining the most advanced techniques in traditional optimization theory, machine learning, and specifically DRL-based resource management. A wide variety of quality of service (QoS) criteria are examined in the cloud, fog, and edge layers. An FL technique, specifically a UAV-aided vehicular network proposed FDRL approach, is examined to improve connection and minimize latency. Proposals have also been made for an FDRL-based vehicle communication system. It also explains 5G’s existing difficulties and possible future paths in vehicle networks. The initial step is to examine a multiaccess edge computing method to generate ideas for more study. This study provides new opportunities for future researchers to work on FDRL-based UAV-assisted 5G and 6G vehicular communication issues. Consequently, 5G and 6G vehicle networks can meet a wide range of quality of service (QoS) standards. Open study areas include an FDRL technique-based vehicular network, an FDRL technique-based unmanned aerial vehicle (UAV), and an FDRL technique-based drone. By using FDRL-based UAVs, any possible delay or reduction can be handled with the help of UAV-based vehicular communication using the FDRL technique. Furthermore, ML algorithms can manage all the communication challenges that were previously difficult to handle. As future work, for 6G networks, integrated aerial-terrestrial communication can be expanded for channel modeling and routing. In order to improve deployment tactics and UAV payloads, additional information is needed on the signal transmission between the user on the ground and the flying base station. There presently exist a multitude of channel models that handle wireless propagation in an urban context. No channel models consider UAV-to-vehicle connectivity which is crucial for UAV-enabled ITS. Moreover, the variants accessible include restricted frequency range, fixed base nonmobile end-users, or stations. A model that corrects these flaws can lead to a better grasp of the fading effects between buildings and properly plan drone deployment. Also, mobile BSs and mobile vehicles can help determine the ideal drone load-out. Lastly, the link multihop infrastructure loss may be calculated more precisely. In this way, DRL and FDTL techniques are helpful to meet a wide range of quality of service (QoS) standards involving 5G and 6G vehicle networks.

Data Availability

Data will be available on request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.