Abstract

With the booming and proliferation of 5G wireless network services in the future, a large number of wireless virtual network resources will emerge, and the densification and heterogeneity of the wireless communication networks that provide services for them will become the trend of development. To this end, a wireless virtual network resource scheduling model based on user satisfaction is constructed, and a reinforcement learning-based wireless virtual network resource scheduling mechanism, IRSUP, is proposed. IRSUP is designed with an intelligent optimization module for user service preferences to address the personalized needs of user service customization, and a reinforcement learning-based intelligent scheduling module is designed to address the challenge of joint optimization of multistar resources. Simulation results show that IRSUP can effectively improve resource scheduling rationality and link resource utilization and user satisfaction, among which service capacity is improved by 30% to 60% and user satisfaction is more than doubled.

1. Introduction

Network virtualization is currently considered one of the most promising ways to achieve network innovation; using network virtualization technology, a variety of network regulation and control functions can be stripped from the hardware, breaking the original In P (InfrastructureProviders) and SP (ServiceProvider) one-to-one correspondence model [1]. The purpose is to unify the management and control of the network and to improve the efficiency of coordination and management, and the SP can rent the In P to provide the required services for users, crossing the barrier of physical resources and making the network more flexible, and excel in solving the rigid problem of backward compatibility of the Internet architecture [2, 3].

With the rapid growth of wireless communications and services, the natural combination of network virtualization technology and wireless communications technology has led to the creation of wireless network virtualization [4]. By applying wireless network virtualization technology, the more expensive wireless physical underlying resources are leased and shared by multiple virtual network operators (VNOs), improving resource utilization and reducing spend. Wireless virtualization and the capabilities of VNOs are described in detail in the next section; at the same time, wireless network virtualization has good potential to solve the problem of coexistence and interoperability of wireless network heterogeneity; in addition, wireless network virtualization technology can be more easily ported to new products or technologies [5]. Although wireless network virtualization has many advantages, there are still many technical challenges in the research and implementation process, such as the definition of interfaces between different layers of the wireless virtual network, signalling control, resource discovery and allocation, mobility management, network management and operation, and network security [6].

In particular, in solving the problem of resource sharing, wireless virtualization is different from wired virtualization, which can accomplish the abstraction and independence of resources at the hardware level; unlike wired networks that can establish tunnels on physical devices to achieve virtual nodes and virtual links, the inherent link broadcast characteristics of wireless communication and the volatility of the channel cannot directly accomplish the abstraction of wireless resources; and it is difficult to achieve independence of resources [7]. In addition, the spectrum resources of wireless networks are very valuable resources, and due to the limitations of current wireless network technology, there is the problem of unreasonable allocation of spectrum resources, resulting in the shortage of resources [8]. The introduction of wireless network virtualization technology centralizes spectrum resources and provides a means for effective management of spectrum resources; similarly, wireless network virtualization enables multiple wireless virtual networks to share expensive wireless physical devices and also provides a platform for resource management for the effective use of wireless physical device resources. In short, wireless network virtualization solves the problem of shared coexistence and efficient reuse of resources, so the management and allocation of resources are the most important technical issues, which determine how to embed the wireless virtual network into the physical underlying network, as well as the good operation of resources’ update [9].

The research of wireless network virtualization has just started, the research results are still immature, and the technology implemented is also very different from that of wired network virtualization. In recent years, the technology implementation of wireless network virtualization has become a research hotspot for scholars and has significant research significance.

In recent years, research on wireless network virtualization has been carried out worldwide in order to promote in-depth research on wireless network virtualization [10], proposing a holistic implementation architecture on wireless sensing networks and the ability to provide advanced services. This solution enables the decoupling of applications from wireless sensor network deployments, allowing the dynamic collaboration of sensor nodes to accomplish new services or add value to applications beyond the original deployment [11]. A wireless LTEv_RAN virtualization solution based on a common platform is presented in [12], which is based on a common virtualized processing platform that enables resource sharing and is cost effective, highly reliable, and easy to deploy. The idea of related base station virtualization is introduced in [13], which studies the virtualization of base stations for LTE networks in 3GPP. The study is based on the idea of base station virtualization mentioned in [14], but the difference is that instead of introducing a hypervisor, a resource management system is used to virtualize the radio access network. In [15], a set of management strategies is proposed, from the extraction of wireless resource information to the formulation of reasonable decisions to the final implementation, with a complete system and well-defined functions; only by collecting information from all aspects of the network, such as users, underlying devices and network conditions, can the information be analyzed to provide a good solution strategy; and the formulation of each strategy contains a rigorous scientific logic and is supported by a strong mathematical theory. The implementation process is also multifaceted and involves close collaboration between all members of the network [16]. The paper [17] proposes a new heterogeneous network convergence platform based on a centralized access network architecture, which allows the sharing and unified management of processing and wireless resources in heterogeneous networks on the basis of a centralized resource pool; it also investigates techniques such as virtualization of base station resources, on-demand allocation of processing resources, and dynamic spectrum sharing. The paper [18] proposes a new business model to serve the convergence of heterogeneous networks and explains that a resource management model can meet customer needs and reasonably utilize the underlying resources which is the key to effective implementation of resource allocation. The literature [19] addresses the problems of poor scalability and autonomy of network virtualization resource management architecture and designs a hierarchical virtual network resource management architecture using a hierarchical and domain-based management mechanism.

In summary, wireless network virtualization has attracted widespread attention in recent years and has made great progress in just a few years. With in-depth research, wireless network virtualization still faces many challenges and requires the hard work and struggle of researchers. The research of wireless network virtualization technology has achieved more results, but the architecture for wireless network virtualization implementation needs to be improved, especially the research based on different levels of slicing approach or hybrid approach, the new division of labor for different roles in the business model, and the establishment of a perfect virtualization system. At the same time, on the basis of the new model, the wireless virtual resource management system should be designed or improved. Wireless network virtualization technology still has numerous problems that need to be solved and has a large research space. It is necessary to improve the management technology of wireless virtual resources in the existing wireless network virtualization architecture.

3. Wireless Network Virtualization

We assume that the architecture is as shown in Figure 1, and that the system is logically a 3-layer model, with the In P layer, the virtual layer, and the SP layer from the bottom up. The virtual layer is responsible for segmenting and reorganizing the physical resources from the In P to form virtual network resources, i.e., virtual spectrum resources and virtual wireless self-backhaul links. The abstracted virtual resources are then leased out to SPs belonging to different virtual networks, and the SPs provide end-to-end customized services to their registered users through the leased virtual resources. The SPs provide end-to-end customized services to their registered subscribers through the leased virtual resources. In order to maximize the total revenue of the SPs and to meet the rate requirements of each SP, the virtual network manager in the virtual layer needs to execute the virtual resource allocation algorithm reasonably and efficiently [20].

4. System Modelling and Problem Modelling

4.1. Physical Network Model

The instantaneous spectral efficiency of user in base station at scheduling period can be expressed as where and are the transmit power of user and user , respectively; and are the instantaneous channel gains from user and user to base station n, respectively; represents the set of users served by base station ; and is the Gaussian white noise [21].

At each scheduling cycle, the virtual network manager dynamically adjusts the allocation of spectrum resources based on the channel quality and queue state of each SP user. Thus, the instantaneous data rate of user on time slot can be expressed as where represents both the connection identifier of the user and the proportion of spectrum resources that the user receives in base station , and represents the user 4 connected to base station . In addition, under a particular virtual resource allocation strategy, the long time average expectation and rate of SPk can be defined as

In the uplink, a static spectrum allocation is used to avoid interference problems for small base stations on the backhaul link. Assume that small base station is allocated to bandwidth ratio for data return, so that the instantaneous return rate of small base station is where represents the transmit power of small base station in the backhaul link and represents the channel gain from small base station to the macrobase station.

4.2. Problem Modelling

In the business model of wireless network virtualization, SPs lease the appropriate amount of virtual resources to provide end-to-end services to registered users. In a virtualized network based on bandwidth slicing, each SP can schedule users for service and allocate the necessary bandwidth resources to them based on quality of service requirements [22]. Therefore, the net revenue of each SP can be defined as the difference between the total revenue earned by the subscriber for the service provided and the total cost of resource depletion for leased access link resources and backhaul link resources, which can be expressed as

The first part of the right-hand side of the above equation represents the revenue received by the SP for providing services to users in the time slot, where is a utility equation, usually defined as an increasing concave function, and represents the unit cost charged by the SP to the users of the service. The second component represents the cost to the SP of leasing spectrum resources in the access link, , representing the agreed price between the In P and the SP for the use of the radio access link. At the time slot, the access side spectrum resources leased by the SP are

The third part of the right-hand side of Equation (6) represents the cost incurred by the SP for leasing a wireless backhaul link for data backhaul. Since multiple SPs share the same backhaul link instead of having exclusive access to it, the backhaul cost per SP depends on the amount of data backhauled and the amount of backhaul spectrum resources available. Similar to the results in the literature [20], the SP’s overhead on the backhaul link at time slot can be expressed as where represents the unit price of the leased backhaul link resource and represents the set of users served by small base station . Since the in-band self-backhaul mechanism utilizes in-band spectrum resources and avoids the need to build additional infrastructure or develop new band resources for small base stations, this technique is clearly more economically efficient than the traditional backhaul method. The resource scheduling problem can be modelled mathematically as follows:

In the above equation, represents the minimum average rate guarantee that needs to be provided to each SP during resource scheduling, , being the minimum rate requirement for SPk. represents the rate of return that must not be less than the rate at the access end for any small base station to avoid an infinite backlog of user data in the small base station that could cause data loss or unnecessary processing delays. andrepresent that the sum of the band resources allocated to the users connected to any base station must not exceed the bandwidth limit of that base station and the restriction ofensures queue stability [23, 24].

5. Intensive Learning Scheduling Module

An agent for reinforcement learning generally consists of three parts: a state perceiver, a learner, and an action selector. The state perceiver maps the environmental state to internal perception; the learner updates the agent’s knowledge strategy according to the reward value of the environmental state and the internal perception; the action selector selects the action a to act on the environment according to the current strategy, which will cause the environmental state of to change under the action . The basic principle is that if an action of the agent is positively rewarded by the environment (i.e., reinforcement signal), the tendency of the agent to produce this action in the future will be strengthened, while on the contrary, the tendency of the agent to produce this action will be weakened.

The wireless network virtualization resource scheduling model needs to make efficient and reliable choices between each wireless network virtualization and each interplanetary link in order to optimize resource utilization. Therefore, the reinforcement learning scheduling module can be used to maximize the resource utilization and resource scheduling rationality of the wireless network virtualization links [25, 26].

The wireless network virtualization resource scheduling problem is a dynamic decision making problem, in which the decision maker (scheduling system) has to be able to decide on-the-fly whether to process or wait for the current arriving user services based on the current resource status. The decision maker has to consider not only the immediate impact of the decision in the current state, but also the future impact of the decision. Specifically in the wireless network virtualization resource scheduling problem, if the decision maker processes the current subscriber service, it consumes the wireless network virtualization resources and is likely to make future tasks with large benefits unable to be executed and must wait longer; if the decision maker allows its subscriber service to enter the waiting queue, it will make the current subscribers less satisfied. Therefore, the decision maker needs to consider the attributes of the current user service and the state of the system resources virtualized by the wireless network to make a decision.

As shown in Figure 2, each wireless network virtualization resource scheduling decision process can be divided into three steps: obtaining current system resource status information, business task decision, and system resource status information update.

The agent obtains the current wireless virtual network resource system resource state information SR through the state perceptron, and then determines the starting wireless network virtualization set [27], the ending wireless network virtualization set , and the link set that are likely to accept the task after matching and analyzing the user service request state information WSj with SR, which should satisfy the following conditions: where (etc. is the wireless network virtualization number, i.e., wireless network virtualization etc.)

At this point, the action selector then selects the corresponding , , and to execute action a based on the state perceiver’s mapping of the environmental resource state into an internally perceived policy.

The agent learner updates the knowledge information by the current decision maker’s decision and the current resource state, and the update formula is shown in where is the current state, is the current action taken, is the next state, is the action taken in the next state, is the reward obtained after the execution of action , is the learning rate, and is the decay factor.

The machine learning scheduling module flow chart is shown in Figure 3.

Therefore, the machine learning scheduling module can update the policy and choose the next action in real time according to the system’s current resources and the current allocation action, so as to maximize the system’s resource scheduling benefits.

According to the principles of machine learning, after an agent’s implementation of the rules , its learner needs to check the validity of action (i.e., ) against what is running in the actual environment. Its effectiveness can be tested based on resource utilization and QoS. If is the resource utilization and QoS remains at high values after running on , then, the action is valid, otherwise is not. In turn, the effectiveness of can be used as a reward function for reinforcement learning .

Letbe the actual utilization of processor, memory, and link bandwidth, respectively, and counted afterrunning on the wireless network virtualization; letbe the actual service quality of service in terms of completion time, transmission rate, and jitter of the service after execution, respectively; Let 4,,,,,be the weighting factor, and. Then, the reward function is where , is the reward function for actual resource utilization and service QoS, respectively, which is calculated as shown in

Let (0 < ≤1) be the threshold of the reward function and and be the corrected amount of the rule reward degree (i.e., ) made by the rule . Then (1)If , then, the rule reward degree ; rule making under this shows an increasing trend(2)If , then, the rule reward degree ; rule making under this E shows a weakening trend

The agent will determine the next rule selection action based on the magnitude of the rule reward degree value.

Therefore, the autonomous learning update strategy model of reinforcement learning can achieve a closed-loop resource scheduling mechanism, which integrates the influence of multiple factors to better update knowledge decisions and achieve reliable and satisfactory resource allocation.

6. Simulation Results and Analysis

In this subsection, the proposed algorithm is simulated and validated. we considers a physical network coverage area for three SPs to provide resource sharing services, where the physical network consists of one macrobase station and three small base stations with self-backhaul capabilities, and the macrobase station is fixed at the center of the area, and the locations of the small base stations and the users are randomly deployed. The available bandwidth in the system and the transmit power of the user and the small base station are 20 dBm and 33 dBm, respectively. We evaluate the performance of the algorithm over  cycles, with each cycle time slot length set too.

First, the performance of our proposed Lyapunov optimization-based algorithm is evaluated. Figure 4 plots the average total SP gain and time-averaged queue backlog versus the control parameters. As shown in the figure, the average total SP gain gradually increases and plateaus as it increases. In Lyapunov optimization, the larger the control parameter, the more the system tends to optimize the penalty function, which in this paper is optimizing the SP average total return. However, as it increases, the time-averaged queuing backlog increases almost linearly, due to the interconnection between the system revenue and the time delay. Therefore, in order to make the system work in the ideal state, a reasonable selection of control parameters is required.

In addition, to visualize the average queue backlog of users in consecutive times and to demonstrate the convergence of queue stability, we arbitrarily selected three users belonging to different SPs. As shown in Figure 5, the horizontal coordinate is the time index and the vertical coordinate represents the time-averaged queue backlog, from which it can be seen that the average cache queue of SP users increases first and then gradually tends to smooth out, which also implies that the Lyapunov optimization-based resource allocation algorithm proposed in this paper can effectively guarantee the stability of the system queue; in other words, it also confirms the effectiveness of our proposed algorithm.

Secondly, we validate the performance of the proposed algorithm on wireless network virtualization and compare it with two algorithms, namely, the static resource allocation (SA-SBL) strategy and the user CSI-based dynamic resource allocation (CSI-DA-SBL) strategy. For SA-SBL, the virtual layer allocates a fixed number of virtual resources to each SP and does not vary over time; for CSI-DA-SBL, the virtual layer dynamically schedules resources based on the channel quality of the SP users with the goal of maximizing system capacity.

Figure 6 shows the variation of the average SP revenue for different numbers of access users. -axis represents the number of users accessed by the system, and y-axis represents the time-averaged total revenue. As the number of subscribers increases, the average revenue of the SP also tends to increase, but at a slower rate. This is because even though access to more users generates more revenue for the SP, the SP has to pay relatively high resource rental costs to the In P. In addition, the virtual network manager will allocate the optimal amount of resources to each SP rather than the full amount available, so the SP’s net revenue will be somewhat limited. The figure also shows that our proposed algorithm significantly outperforms SA-SBL and CSI-DA-SBL because our proposed algorithm jointly considers the channel quality of the users and the utility of the SPs and aims to maximize the total system revenue by allocating the right amount of virtual band resources to different SPs. In contrast, SA-SBL allocates a fixed amount of resources to each SP regardless of whether the SP needs it or not, resulting in a waste of resources and therefore no significant improvement in the SP’s revenue in this scheme.

Figure 7 shows the relationship between the average utility of users and the number of access users in different scenarios, where the user utility is the amount paid by the user to the SP. It can be seen from the figure that the average user utility of the proposed algorithm tends to decrease as the number of access users increases. This is because the more dense the system is, the proportion of users with poor channel conditions will increase, and in order to ensure the delay performance of these users, the proposed algorithm will allocate relatively more resources to them to avoid an infinite queue backlog; thus, the average rate obtained by all users in the system will be slightly reduced.

Figure 8 depicts the average queue backlog versus packet arrival rate in different scenarios. For an arbitrary average packet arrival rate, our proposed virtual resource allocation scheme has a lower queue backlog. This is because our proposed algorithm dynamically allocates resources at each time slot in conjunction with the user’s cache state, thus ensuring that all queues in the system are smoothed out, and thus, our proposed algorithm has better latency performance.

Finally, we tested the performance of the backhaul mechanism and compared two backhaul schemes, namely, the wireless self-backhaul (FBB) mechanism with a fixed band cut ratio and the wired backhaul (WB) mechanism, where in the FBB scheme, the band ratio between the access and backhaul links is fixed for each time slot, while the traditional WB mechanism enables small base stations to transmit data from associated users over fiber or digital subscriber lines to macrobase station or core network. Next, the performance of the different schemes on backhaul will be evaluated in terms of both SP average total utility and In P average utility.

As can be seen from Figure 9, our proposed dynamic in-band self-backhaul mechanism can bring more benefits to both SPs and In P. Since the FBB mechanism uses a statically configured band cut on the radio backhaul and access links, the self-backhauling small base stations are unable to match the backhaul and access capacity, wasting a certain amount of band resources and reducing the benefits for In P and SP. Although WB can provide larger backhaul capacity, the cost of backhaul will also increase, and In P’s backhaul revenue will be used to build backhaul equipment, so the SP and In P revenue in the WB mechanism is significantly lower than the other two options.

7. Conclusions

In the face of increasingly complex network environments and the scarcity of wireless resources, there is an urgent need for new scalable wireless network technologies, and network virtualization technology has great potential. We propose a machine learning-based resource scheduling mechanism for wireless virtual networks and design an intelligent optimization module for user service preferences in response to the personalized needs of user service customization. The results show that the scheme in this paper can effectively improve indicators such as resource scheduling rationality, link resource utilization, and user satisfaction.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declared that they have no conflicts of interest regarding this work.