Abstract

The popularization of electric vehicles faces problems such as difficulty in charging, difficulty in selecting fast charging locations, and comprehensive consideration of multiple factors and vehicle interactions. With the increasingly mature application of navigation technology in vehicle-road coordination and other aspects, the proposal of an optimal dynamic charging method for electric fleets based on adaptive learning makes it possible for edge computing to process electric fleets to effectively execute the optimal route charging plan. We propose a method of electric vehicle charging service scheduling based on reinforcement learning. First, an intelligent transportation system is proposed, and on this basis a framework for the interaction between fast charging stations and electric vehicles is established. Subsequently, a dynamic travel time model for traffic sections was established. Based on the habits of electric vehicle owners, an electric vehicle charging navigation model and a reinforcement learning reward model were proposed. Finally, an electric vehicle charging navigation scheduling method is proposed to optimize the service resources of the fast charging stations in the area. The simulation results show that the method balances the charging load between stations, can effectively improve the charging efficiency of electric vehicles, and increases user satisfaction.

1. Introduction

With the extensive development of electric vehicles in various countries around the world, the number of electric vehicles is increasing, and problems such as difficulty in charging electric vehicles, serious line losses, voltage drops, charging safety, and severe peaks are expected [13]. Electric vehicle charging and charging path planning should receive more attention. For electric vehicles whose driving time is longer than the nondriving time, fast charging is an important power supplement method [4, 5]. The disorderly charging of electric vehicles would not only cause congestion of fast charging stations, which increases the burden on the regional grid, but also result in concentrated charging times causing problems such as transformer overload and increased peak-to-valley difference, which is not conducive to the safe operation of the distribution network [6, 7]. Therefore, reasonable guidance and charging scheduling for vehicles with fast charging needs are beneficial to the alleviation of the burden on the regional grid while meeting the charging needs [8, 9].

In response to the above problems, scholars at home and abroad have conducted some research. In [10], we studied the uniform charging node in [11] and extended it to the nonuniform charging node in [12] by solving the mixed integer nonlinear programming problem (MINLP) of the single vehicle. The remaining energy of the vehicle on each node is expressed as a dynamic programming (DP) problem for a single electric vehicle path problem, and a DP-based algorithm is provided to determine the optimal path and charging strategy of the electric vehicle subflow level. In [13], we proposed a distributed electric vehicle path selection system based on the distributed ant colony algorithm (ACA). The distributed architecture minimizes the total travel of electric vehicles to the destination by proposing a set of nearest fast charging stations. In [14], we proposed an improved Dijkstra method to solve the multiobjective optimization problem and obtained a multiobjective optimization function including travel time, fast charging station number of vehicles, and charging load, thereby optimizing electric vehicle charging path planning and alleviating fast charging stations. The lack of surrounding traffic congestion reduces waiting time and improves the availability of charging facilities.

The above literature has its own characteristics regarding charging route navigation and charging scheduling, but when studying electric vehicle charging route navigation, it only focuses on the economic benefits and waiting time of the vehicle and ignores the impact of fast charging station loads when charging large-scale electric vehicles. Most charging scheduling uses a fixed strategy while ignoring the influence of various factors, such as the increase in the number of electric vehicles and user habits, on electric vehicle charging scheduling for different time periods.

In this context, we propose an electric vehicle charging service scheduling method based on reinforcement learning to meet the needs of electric vehicle owners. The structure of the paper is as follows. In Section 2, we propose a fast charging station and electric vehicle system framework and use this framework to study electric vehicle charging navigation. In Section 3, we establish a dynamic travel time model for traffic sections and propose an electric vehicle charging navigation model. In Section 4, incorporating reinforcement learning, we further propose an electric vehicle charging navigation scheduling method to rationally optimize the service resources of each fast charging station in the area. In Section 5, we use a certain city as a model and compare the simulation results of the proposed method with those of the traditional electric vehicle charging navigation method to demonstrate the superiority of this method. Conclusions and further research directions are outlined in Section 6.

2. Fast Charging Station and Electric Vehicle System Framework

With the gradual development and application of 4G and 5G communications, the applications of various technologies for navigation and vehicle-road collaboration have become increasingly mature [15, 16]. At the same time, edge computing technology also provides technical guarantees for fast response and low error rate operating environments. The computational burden of the central scheduling node is transferred to the user edge side, which greatly increases the processing efficiency and enables electric vehicles and fast charging stations to share information and synchronize processing [17].

Currently, electric vehicles can share information with fast charging stations and other systems through the Internet, upload the status and location of electric vehicles in real time, and navigate in real time based on the location of electric vehicles [18, 19]. Moreover, a variety of optimal dynamic charging methods for electric fleets based on adaptive learning have been proposed, and the results show that this method can basically achieve the optimal solution. On this basis, the optimal route charging schedule can be effectively carried out for the electric fleet of efficient and dynamic transportation systems. Inspired by the above research, this paper proposes a guidance system structure for electric vehicles and fast charging stations. The structure of the guidance system for electric vehicles and fast charging stations in this article is shown in Figure 1. With the Internet platform as the center, the system dynamically updates intersection information and provides dynamic charging and navigation strategies for electric vehicles by referring to road condition information and fast charging station information. Navigation combines the road condition information and the waiting time of each fast charging station and chooses the fast charging station with the highest overall efficiency for itself to charge. The fast charging station itself further charges the electric vehicle according to various factors, such as weather, energy supply and demand, and user habits. At intervals, the traffic information and fast charging station information are refreshed according to the above selection, and the charging navigation strategy is provided again.

3. Preliminary Model Establishment

This section first proposes a dynamic travel time model for traffic sections and, on this basis, establishes a charging navigation model that considers distance, time, and economic benefits for a single electric vehicle.

3.1. Dynamic Travel Time Model of Traffic Section

The dynamic path selection model for electric vehicles in this paper is based on the dynamic travel time model of the road segment. First, the movement of the vehicle in the road segment is described by the cumulative number of vehicles , which represents the sum of the number of vehicles passing observation point before time . According to the definition of flow and density, the traffic flow and traffic density are as follows:where and are the number of vehicles at position at time and the number of vehicles at position at time , respectively.

According to the traffic volume and traffic density, the traffic velocity can be obtained as follows:

Assuming that the vehicles on the road section are evenly distributed in the road section, the traffic density of road section is as follows:where and are the entrance and exit positions of road section , respectively; is the number of vehicles that can be accommodated per unit length in road section ; and is the length of road section .

According to the above formula, the vehicle speed on road section can be expressed as follows [20]:where is the free flow velocity of section ; and are the maximum density and minimum density on section , respectively; is the minimum vehicle speed; and and are system model parameters.

It can be concluded that the passing time of road section is expressed as follows:

If the road congestion signal is received halfway, the system changes the route to reduce the delay time. The subjective probability of the owner changing road section to road section is :where is the travel time of section in the route; is the travel time of section in the route; is the maximum travel time; and is a subjective coefficient.

Therefore, the length of the driving section can be approximated by subjective probability as :where is the length of road section .

3.2. Electric Vehicle Charging Navigation Model

Electric vehicles need to be charged frequently during use, so there will be demand for fast charging. According to the charging needs of different vehicles, implementing different navigation schemes can effectively improve the response speed of the vehicle. This section comprehensively considers the driving distance required to reach a fast charging station, the total time of driving and charging, and the charging economy to establish a charging navigation model.

For electric vehicle owners with high total driving distance requirements, this article considers the principle that the direction of the fast charging station is the same as the destination direction when all vehicles are connected to the Internet. It is proposed that the sum of the shortest distance from the starting point of the vehicle to the fast charging station and from the fast charging station to the destination is expressed as follows:where and are path nodes; is the total number of path nodes; and are the length of the road section from the starting point to the fast charging station and from the fast charging station to the destination with and as the end nodes; and is a variable that equals 1 for the road section with and as the end nodes and equals 0 otherwise.

For electric vehicle owners with high total time requirements, this article proposes the shortest total charging time as the goal to optimize the charging path:

The specific solutions of and are as follows:where is the travel time to the fast charging station; is the waiting time in the fast charging station, which is determined by the number of vehicles; is the charging time; is the expected voltage at the end of charging, which is set to 95% of the full charge; is the remaining power to the fast charging station; is the charger power; is the charging efficiency; is the electric vehicle battery capacity; is the initial state of charge of the electric vehicle; and is the electric energy consumed by the electric vehicle per kilometer.

For electric vehicle owners with high cost requirements, this article proposes the minimum cost as the goal to optimize the charging path:where is the electricity cost consumed on the charging path and is the cost consumed by the fast charging station.

4. Electric Vehicle Charging Navigation Scheduling Strategy Based on Reinforcement Learning

The goal of the reinforcement learning algorithm is to find an optimal strategy based on the Markov decision process to maximize the expected cumulative return. In this section, the driving distance of the electric vehicle, the total driving and charging time, and the charging economy are optimized in parallel to provide the electric vehicle owner with the best electric vehicle charging navigation scheduling strategy [21, 22].

4.1. Strategy Gradient Algorithm

The basic principle of reinforcement learning is to learn from exploratory experiments and obtain action strategies to achieve established goals. The learning subject is the agent; the object interacting with the agent is the environment. Reinforcement learning is an abstraction of goal-oriented interactive learning problems. In a certain environment state, the agent takes action, and the environment responds to the agent’s actions, presents the new environment state to the agent, and feeds a certain reward back to the agent. The agent and the environment continue to interact to achieve the ultimate goal of maximizing returns.

The interaction process between the agent and the environment can be described by a time series: in a certain period , the agent takes a certain action a according to the current environment state ; in the next period , due to the agent’s action , the environment state changes from to , and the agent is rewarded with . In each time period, the probability distribution of all actions that the agent can take in the current environment state is called the agent’s strategy . The agent continuously changes its strategy through interaction and finally achieves the goal of maximizing rewards.

The reinforcement learning problem satisfies the Markov characteristic; that is, the state of the next period is only related to the state of the current period and has nothing to do with the state of the previous period. The policy-based method is used to express a policy. Assuming that the strategy of electric vehicle charging and navigation control consists of a -step decision, the agent obtains corresponding training trajectories by interacting with the environment as follows:where represents the action determined at time during the training, represents the state after action during the training, and represents the reward obtained after action during the training. The expected return reward for all stored trajectories is as follows:where is the reward value of trajectory , is the probability of trajectory , is the probability of in state , and is the probability of selecting actions according to input and output strategy in state .

Therefore, reinforcement learning can be expressed as solving the maximum expected return reward . To realize the strategy, the partial derivative of the parameter set is obtained to obtain the optimized strategy function as follows:

The reinforcement learning policy gradient algorithm is equivalent to solving a partial derivative problem. If the parameter set is updated in the positive direction, that is, the reward increases, the probability of trajectory will increase, and vice versa. The pseudocode of the policy gradient Algorithm 1 is given below.

(1)In the neural network, initialize the parameter set randomly and initialize .
(2)Initialize , randomly initialize action and output state , calculate local reward , and then add the trajectory generated by the action to the stored trajectory of the training.
(3)Input state to the neural network and select a random action .
(4)After the simulation environment executes action , obtains the output state , and calculates the local reward , the trajectory generated by the action is added to the stored trajectory of the training.
(5)Judge whether is true; if it is true, go to step 6; otherwise, assign to and go to step 3, where is the variable to be accumulated and is the expected value of the total reward for a single trajectory.
(6)Calculate the strategy optimization strategy function .
(7)Assign to , update the parameter set in strategy to , and judge whether is true; if so, go to step 2; otherwise, the reinforcement learning training process is over; save the updated parameter set as the most optimal parameter set and the optimal strategy ; is the maximum number of trajectories
4.2. Action Selection

Taking the vehicle travel path as an example, the control parameter is , and the vehicle has 3 possible actions at each intersection. The value range is . 0 means going forward, 1 means turning left, and 2 means turning right.

4.3. Environmental Status

When an electric vehicle performs an action at an intersection and acts on the environment, the state value corresponding to the environmental feedback is , that is, , and the dimension is 3. Among them, , , and correspond to the distance, time, and cost, respectively, after the current action is executed. After obtaining the environmental state value, the corresponding reward value is calculated, and at the same time, the environment will move to the next state.

4.4. Reward Function Design

The reward function is designed as follows:where represents the reward value obtained by the action performed by the electric vehicle at each time node, that is, the quality of the current trajectory action. Among them, , , and are the weighting coefficients: when the owner only cares about the distance, equals 1, and the rest equal 0; when the owner only cares about the total time, equals 1, and the rest equal 0; when the owner only cares about the cost, equals 1, and the rest equal 0; and if the owner chooses to focus on all three variables, set , and assign values according to the proportion.

4.5. Controller

For electric vehicle charging navigation, a scheduling algorithm based on the policy gradient algorithm is proposed according to the personal habits of different electric vehicle owners. By observing the information to select a behavior directly for back propagation and using rewards to directly enhance and weaken the possibility of selection behavior, the probability of selecting good behavior will increase next time, and bad behavior will be weakened next time.

A three-layer wavelet neural network is used. The wavelet neural network is a multilayer feedforward neural network trained according to error back propagation [23]. This article uses a three-layer neural network, that is, one output layer, one input layer, and one hidden layer, as shown in Figure 2. The state is set as the input layer of the neural network. Its dimension is 3; the hidden layer of the neural network has 20 neurons; and the output layer contains 3 neurons, corresponding to 3 output actions.

The connection weights and bias terms between the input layer and the hidden layer and between the hidden layer and the output layer are represented by a parameter set of . The input and output strategies of the training wavelet neural network of the strategy body are defined as .

The activation function of the connection between the input layer and the hidden layer is , and its function formula is as follows:

The activation function connecting the hidden layer and the output layer is a wavelet basis function, and its function formula is as follows:

According to the pseudocode of the algorithm, the specific training process can be obtained as shown in Figure 3.

5. Simulation Results and Discussion

Taking the city in Figure 4 as a model, the city includes 21 nodes, 32 road sections, and 4 fast charging stations. The number marked on the road section represents the length of the road section in km. Fast charging stations are located at nodes 9, 12, 14, and 19. For electric vehicles, the battery capacity is 90 kW·h, the cruising range is 400 km, and the fast charging station power is 350 kW. When the electric vehicle leaves the fast charging station, is 90%; the training parameters are as follows: the number of training rounds is 1900, and the learning coefficient is 0.95. The discount rate is 0.95.

The vehicle randomly sets the initial position and target position (on 21 nodes) and randomly sets the remaining power (not higher than 30%). According to the distance selected by the user, the total time consumed, and the cost as the reward value, the vehicle is trained from the initial position to the fast charging station to charge and from the fast charging station to the target location. After the training is completed, the final reward changes are shown in Figure 5.

Figure 5 shows that as the number of training sessions increases, the training reward gradually increases. After 600 training sessions, the curve shows an oscillating trend, and the reward oscillates around 190. In the subsequent training, the reward is basically stable. Save the neural network model obtained from the last training parameter.

The 08:00 traffic flow distribution obtained through urban traffic simulation is shown in Figure 6. The green line represents smooth traffic, orange represents traffic congestion, and red represents heavy traffic congestion. For the traffic flow shown in Figure 6, the saved reinforcement learning model is used to obtain the station selection probability of the electric vehicle when each network node starts, as shown in Figure 7. It can be concluded that under the premise of considering congestion, the trained reinforcement learning model can effectively select fast charging stations corresponding to shorter distances according to the target node.

Now, take an electric vehicle starting at node 13 and ending at node 2 as an example to analyze its dynamic station selection strategy. Consider the distance, total time, and cost required for the owner to obtain charging navigation during driving, as shown in Table 1.

Plan 1 takes the minimum distance as the goal and chooses fast charging station No. 9, and the travel route is shown as the solid line in Figure 8. Plan 2 takes the minimum time as the goal and chooses fast charging station No. 14, and the travel route is shown by the dashed line in Figure 8. Plan 3 takes the minimum cost as the goal and chooses No. 12 fast charging station, and the travel route is shown as the crossed line in Figure 8.

Multiple routes were selected for testing, and methods from [10, 13] and the charging navigation method proposed in this paper were compared. The performance comparisons under the comprehensive requirements of the research vehicle owners are shown in Figure 9.

The first graph in Figure 9 shows the change trend of the average distance with the increase in the number of test routes under the premise of considering the comprehensive performance required by the user. In this graph, the comparison between the method in this paper and the methods in the other two references is shown. With the increase in the number of routes, the average travel distance of the three methods fluctuated and finally stabilized in the vicinity of 17 km. In this process, the total distance predicted by the three methods is basically the same. The second graph in Figure 9 shows the trend of the total time as the number of test routes increases. As the number of routes increases, the total time of the method in this paper steadily decreases, and finally the time is reduced to 0.7 h, while for the other two methods, the total time consumed curve presents an oscillating situation, and the time consumed is unstable and greater than that for the method in this paper. It can be concluded from the curve that the method in this paper has the least total time consumption. The third graph in Figure 9 shows the trend of the total cost as the number of test routes increases. With the increase in the number of routes, the total cost of the method in this paper first increases, then gradually decreases, and finally stabilizes at approximately 30 yuan. For the method from [10], the total cost of the method was initially lower than that of the method in this paper. With the increase in the number of test routes, the cost began to increase and eventually was significantly higher than that of the method in this paper. The cost for the method in [13] remained higher than the cost for the method in this paper after initially oscillating lower. It can be concluded that, under the comprehensive performance requirements, the total distances of the three methods are basically the same. On this basis, with the increase in the number of route tests, the method in this paper has the least total time and cost, which indicates the superiority of the method in this paper.

In the case of the same time, initial point, and destination, we compare user satisfaction under the electric vehicle charging navigation strategy in [10, 13, 2426]. The user satisfaction from testing electric vehicles using these methods is shown in Table 2. It can be seen in the table that, with the increase of test lines, user satisfaction under this method far exceeds other methods. It has been demonstrated that the method in the article can effectively meet the charging and navigation needs of users.

6. Conclusions

We propose an electric vehicle charging service scheduling method based on reinforcement learning to meet the needs of electric vehicle owners. First, based on an intelligent transportation system, a framework for the interaction between fast charging stations and electric vehicles is proposed. Subsequently, the dynamic travel time model of the traffic section was established, and the electric vehicle charging navigation model was proposed. Finally, combined with reinforcement learning, the electric vehicle charging navigation scheduling method is further proposed to rationally optimize the service resources of each fast charging station in the area. The results show that, compared with the existing methods, the algorithm and model proposed in this paper can effectively optimize electric vehicle charging and navigation scheduling based on the needs of the vehicle owner and can meet the various needs of the vehicle owner.

Data Availability

The MATLAB simulation data used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, 12 months after publication of this article, will be considered by the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this study.