Abstract
The traffic congestion problem on urban expressways, especially in the weaving areas, has become severe. Some cooperative methods have been proven to be more effective than a separate approach in optimizing the traffic state in weaving areas on urban expressways. However, a cooperative method that combines channelization with ramp metering has not been presented and its effectiveness has not been examined yet. Thus, to fill this research gap, this study proposes a reinforcement learning-based cooperative method of channelization and ramp metering to achieve automated traffic state optimization in the weaving area. This study uses an unmanned aerial vehicle to collect the real traffic flow data, and four control strategies (i.e., two kinds of channelization methods, a ramp metering method, and a cooperative method of channelization and ramp metering) and a baseline (without controls) are designed in the simulation platform (Simulation of Urban Mobility). The speed distributions of different control strategies on each lane were obtained and analyzed in this study. The results show that the cooperative method of channelization and ramp metering is superior to other methods, with significantly higher increases in vehicle speeds. This cooperative method can increase the average vehicle speeds in lane-1, lane-2, and lane-3 by 14.51%, 14.81%, and 37.03%, respectively. Findings in this study can contribute to the improvement of traffic efficiency and safety in the weaving area of urban expressways.
1. Introduction
Green, low carbon, and fast access are the main needs of urban transportation [1–3]. As an important part of urban transportation, the main function of urban expressways is to ensure the fast and smooth running of vehicles in cities, but with the rapid increase of traffic volume, the traffic congestion problem on urban expressways, especially in the weaving areas, has become severe. A weaving area is a road section where several different traffic streams in the same general direction cross without using any traffic control devices [4]. Due to the constraints of urban land use, the distance between entrances and exits is usually close on urban expressways, leading to obvious weaving phenomena and significant efficiency reduction [5, 6]. Thus, to improve the traffic state in the weaving area on urban expressways, many methods have been proposed, such as channelization, ramp metering, and variable speed limits.
Channelization on urban expressways commonly uses traffic line markings on the road surface to separate different lanes according to the volume or direction of traffic, so that vehicles with different routes and speeds can travel in the prescribed directions without interfering with each other, just like the water flowing in a channel. Channelization can effectively organize the traffic flow through the weaving area in an orderly manner, reduce traffic conflicts, and maximize the utilization of road resources [7]. Reasonable channelization with continuity and closure can better relieve traffic congestion and reduce the risk of collisions [8].
Ramp metering refers to using signal control methods to limit the number of vehicles that enter the main lanes from the ramp of an urban expressway. ALINEA (i.e., Asservissement linéaire d’entrée autoroutière) is a well-known simple feedback loop controller for ramp metering control method, and its theory has proven to be effective in reducing travel time and increasing vehicle speed under a suitable traffic state [9]. After investigating the effects of traffic demand levels, queue spillback handling strategies, and downstream bottleneck conditions on the performance of ALINEA, it was found that ALINEA could increase traffic capacity but cause more times of travel and waiting [10]. With the rapid development of machine learning, many intelligent algorithms, especially reinforcement learning, have been applied to ramp metering. When using the reinforcement learning algorithm, the steps of setup, tuning, calculation, and measurement are more complex and require greater computing power support, but better optimization results are often achieved in ramp metering. For example, compared with ALINEA, the deep reinforcement learning-based ramp metering method can respond proactively to different traffic states and take more correct actions for traffic breakdown prevention [11]. Using an online reinforcement learning method to model ramp metering could avoid the establishment of an accurate traffic model and the reliance on prior knowledge, and this method could increase the average vehicle speed by 6.80% and decrease the total travel time by 5.22% when compared to the ALINEA [12]. However, the ramp lengths provided by most of those ramp metering methods are always too long, which may lead to safety problems and reduce the efficiency of ramps [13].
Variable speed limits are one of the dynamic traffic control methods that are generally set before entering weaving areas on urban expressways. Variable speed limits use variable message signs to communicate dynamic speed limit information to drivers, which can slow down fast-moving vehicles as they approach the weaving areas or bottleneck. This kind of method could comprehensively consider various information (e.g., vehicle speed, traffic volume, weather, road conditions) to determine an appropriate speed limit and display it to drivers. By reducing the speed of vehicles in advance (i.e., before entering the weaving area), the variable speed limits method could decrease the speed variation of vehicles in the weaving area, which can improve traffic state and safety [14]. It was reported that the variable speed limits method could reduce the total traveling distance by 12.6% and the total traffic density by 18.1% on urban expressways [15].
Cooperative methods that simultaneously use two or more traffic control methods can leverage the advantages of multiple methods to optimize traffic states. The cooperation of ramp metering and variable speed limits was proposed for expressways management strategies, which improved the speed of vehicles and reduced gas emissions [16]. Another cooperative method that considered ramp metering (ALINEA) and lane change strategies (i.e., encouraging vehicles at inner lanes to change their lanes earlier before entering the weaving area) was introduced, leading to a 4% reduction in the total travel time when compared with only using a separate method [17].
As to data collection, many devices have been used to monitor the traffic state of weaving areas on urban expressways, including inductive detector loops, microwave radar detectors, infrared sensors, and embed magnetometers [18]. Although these traditional devices can acquire high-precision data, these data are often collected at a specific point and cannot reflect the traffic flow over space [19]. The unmanned aerial vehicle (UAV) can solve this problem by providing traffic flow data relevant in both time and space [19]. In addition, due to the advantages of mobility, low cost, and broad view range, the UAV has become an attractive method for traffic flow information collection [20].
Given the above, current studies have presented many different methods to improve traffic states in weaving areas on urban expressways, and some cooperative methods (e.g., ramp metering with variable speed limits method) have been proven to be more effective than a separate approach. However, a cooperative method that combines channelization with ramp metering has not been presented, and its effectiveness has not been examined yet. Thus, to fill this research gap, this study proposes a reinforcement learning-based cooperative method of channelization and ramp metering to achieve automated traffic state optimization in the weaving area. This study uses a UAV to collect the data of weaving areas on urban expressways, and four control strategies (i.e., two kinds of channelization methods, a ramp metering method, and a cooperative method of channelization and ramp metering) and a baseline (without controls) are designed in the simulation platform (Simulation of Urban Mobility, SUMO). The average vehicle speeds under these strategies are compared to verify the superiority of the cooperative method. The new cooperative method presented in this study could contribute to the improvement of traffic efficiency in weaving areas on urban expressways. Figure 1 shows the streamlined flowchart of this paper.

2. Methodology
2.1. Experiments
As shown in Figure 2, data used in this study are obtained from the west line of North-South Elevated Road (from Luochuan Road to Gonghexin Road Interchange), which is one of the busiest urban expressways in Shanghai, China. At the entrances and exits of this urban elevated line, there are clear weaving areas formed by large traffic volume. In the road section (Luochuan Road-Gonghexin Road Interchange), the traffic flow is relatively large throughout the year, especially in the morning and evening rush hours. Therefore, it is reasonable to choose this site for analyzing and evaluating the traffic flow state in the weaving area of urban expressways.

The length of the weaving area in the road section Luochuan Road-Gonghexin Interchange is about 100 m, which is very short among many weaving areas of the west line of North-South Elevated Road. Therefore, the weaving phenomenon of this road section is more obvious. The speed limit on this road is 100 km/h for the main lanes and 40 km/h for the ramps. This study obtained data during the peak hours (8 am–9 am and 5 pm–6 pm) on April 10th, April 20th, May 12th, and December 4th, 5th, and 6th, 2021 at the weaving section of Luochuan Road-Gonghexin Interchange. The total time of collected data is more than 10 hours.
This study uses the unmanned aerial vehicle (UAV), DJI UAV Mavic 2 Enterprise, for data collection. Under the premise of sufficient portability, this UAV has excellent imaging and positioning systems with high accuracy and stability. It can obtain 4096 × 2160 dpi video images of roads and vehicles while flying in the air and provide centimeter-level positioning data. Additionally, information collected by the UAV can be transmitted to and saved in the ground equipment in real time.
2.2. YOLO Algorithm for Data Extraction
You Only Look Once (YOLO) is a powerful deep learning method that is widely used in object detection and recognition [21]. Compared with other object detection and recognition algorithms (such as traditional convolutional neural networks), YOLO is faster, more accurate, and more lightweight. Therefore, this study chooses the YOLO algorithm to recognize the vehicles in the videos and calculate the speed of these vehicles.
This study uses a trained YOLO deep learning model by the official data to identify and extract vehicle information [22]. In this study, around 200 Gb of data are collected for processing in YOLO. Through calibrating the length of the ground marking line (6 m), the length of lanes and vehicles can be accurately computed. Then, microscopic vehicle information, such as trajectory, speed, and acceleration, can be obtained. Figure 3 illustrates an example when using YOLO deep learning algorithm to identify vehicles and mark the vehicles’ operating states.

Figure 4 shows the distributions of the average vehicle speeds in all lanes in the weaving area at Luochuan Road-Gonghexin Interchange. It is found that most of the vehicles in lane-1, lane-2, and lane-3 drive at a low speed (<40 km/h) compared with the speed limit (100 km/h), showing that the weaving phenomenons in these three lanes at peak hours are very serious. However, vehicle speeds at lane-4 and lane-5 stay at a normal level with mean values close to 90 km/h.

2.3. SUMO Simulation Platform
This study uses simulation of urban mobility (SUMO) to reproduce realistic traffic flow conditions and simulate traffic conditions under different control strategies. SUMO is an open-source, highly portable, microscopic, and continuous traffic simulation package designed to handle large networks [23]. It offers various grades of road models and comes with a large number of tools for simulation scenario creation [24].
The traffic control interface (TraCI) which is a useful SUMO tool serves as a bridge between SUMO simulation and control strategies in this study. TraCI can retrieve the information of simulated objects and manipulate their behavior online [25]. Control strategies proposed in this study are coded in the Python script which is one of the appropriate formats for TraCI, and then, these strategies can be implemented in the SUMO simulation platform by TraCI.
The traffic environment in the 100-meter weaving area Luochuan Road-Gonghexin Road Interchange is simulated in this study. As shown in Figure 5, the inserted background map shows the real environment around the road, and the roads and vehicles on top of the base map show the simulation condition.

To mimic the characteristics of the real traffic flow, a baseline condition (i.e., without controls) is designed in the SUMO simulation platform. The actual trajectories and speeds extracted by YOLO from the UAV videos are used to calibrate the simulation parameters, including traffic volume, car-following model parameters, lane-changing model parameters, and origin-destination matrix. In this study, two models embedded in SUMO are calibrated, including the Wiedemann-99 car-following model [26] and the SL2015 lane-changing model [27]. After adjustments, the parameters of these two models used in the simulation are shown in Table 1.
2.4. Control Strategies
As shown in Figure 6, in contrast to the baseline (without controls), this study designed four control strategies (i.e., two kinds of channelization methods, a ramp metering method, and a cooperative method of channelization and ramp metering). To clearly explain each strategy, the entire weaving area is divided into six parts, which are labeled as A area, B area, …, and F area from left to right, respectively. As for B, C, D, and E areas, the length of each area is 25 meters. The arrow at the bottom of each figure indicates the general direction of the traffic flow. The arrows in B and C areas are marked to indicate the directions in which vehicles can change the lane. The detailed description of these control strategies is as follows.

(a)

(b)

(c)

(d)

(e)
2.4.1. Baseline (No Controls)
It does not use any controls, which means all vehicles could change lanes without any limits in all areas (see Figure 6(a)). The crossed arrows in B and C areas denote that vehicles are free to change lanes in B and C areas under the baseline.
2.4.2. Control-1 Strategy (Channelization Left-Right)
It restricts the lane change of vehicles in the B and C areas as shown in Figure 6(b). In the B area, vehicles can only change lanes from lane-2 to lane-3 (to the left) but cannot from lane-3 to lane-2. In the C area, on the contrary, vehicles can only change lanes from lane-3 to lane-2 (to the right) but cannot from lane-2 to lane-3.
2.4.3. Control-2 Strategy (Channelization Right-Left)
It also restricts the lane change of vehicles in B and C areas, where the lane changing rule is just the opposite of the control-1 strategy (see Figure 6(c)). In the B area, vehicles can only change lanes from lane-3 to lane-2 (to the right), while in the C area, vehicles can only change lanes from lane-2 to lane-3 (to the left).
2.4.4. Control-3 Strategy (Ramp Metering)
It uses a reinforcement learning based-ramp metering in the A area to control the number of vehicles that are on the ramp and about to enter into B area, which can balance the traffic flow in the rest parts of the weaving area (see Figure 6(d)).
2.4.5. Control-4 Strategy (Channelization and Ramp Metering)
It adopts a cooperative method of channelization and ramp metering as shown in Figure 6(e). This strategy is a combination of the control-2 (channelization right-left) and control-3 (ramp metering) strategies.
2.5. Reinforcement Learning Algorithm
The reinforcement learning algorithm is applied to implement ramp metering in this study. Reinforcement learning is a kind of machine learning method that is used to describe and solve problems in which an intelligence agent learns strategies to maximize returns or achieve specific goals during its interactions with the environment [28]. Reinforcement learning performs well in terms of perception and feedback to the environment. With the self-learning capability, reinforcement learning can continuously learn to improve itself and adapt to new traffic conditions [29]. Due to the above advantages, reinforcement learning can be an adaptive and efficient algorithm for ramp metering [30].
In this study, Q-learning, a model-free reinforcement learning method, is used to optimize the ramp metering and reach the best traffic state (i.e., the largest average vehicle speed in main lanes). The optimization objective of Q-learning is to maximize the average vehicle speed in main lanes through adjusting the phase of the ramp signal when considering channelization methods. Q-learning does not require a prespecified environment model as the basis for action selection [31]. The procedure of the Q-learning algorithm is demonstrated in Table 2.
In Q-learning, there are four important components: state (), action (), reward (), and policy (). State consists of the phase of the ramp signal (red or green) and the average vehicle speed of all main lanes. Action denotes setting the ramp signal to red or green. With state and action , the state-action value function can be computed, which evaluates the value of the selected action in a given state. The value of this function then is used in the policy to determine the choice of the next action . In this study, the ε-greedy exploration is selected as the policy . Compared with other complex policy algorithms, the ε-greedy exploration is easier to understand and implement and works as well as other algorithms. The ε-greedy exploration chooses the best action based on the highest value with probability and a random action with probability ε. In this study, the value of ε is 0.1.
To realize the optimization of the average speed of all main lanes, the reward is set to the following form. When the current average vehicle speed of all main lanes is greater than or equal to that in the previous simulation step, the value of the reward is 1, and vice versa is 0. Therefore, when the maximum value of the cumulative reward is obtained, the average vehicle speed of all main lanes reaches the maximum value. The maximum cumulative reward is calculated as following equations (1) and (2). The Q-value is the value of the evaluated action. It represents the expectation of the sum of the rewards () of the agent from choosing this (a) action up to the final state. The formula is described in equation (3):where is the network parameter; is the total simulation time (s) and equal to 3600 s in this study; is the current simulation time (s); is a reward at time ; is the average of in time; is a reward in the policy ; is the conditional probability operator; is the number of simulation episodes and equal to 100 in this study; is the current simulation episode; is the reward obtained by the policy in the th simulation episode; is mathematical expectation.
According to the above equations, to obtain the maximum cumulative reward , the gradient of over the number of simulations (denoted by ) needs to be calculated first, as shown in the following equation:where is the gradient operator over the number of simulations; is the probability of the policy for the th simulation episode under the parameter; is the state in the th simulation episode at time ; is the action in the th simulation episode at time .
3. Results
3.1. Simulation Results in Lane-1
As shown in Figure 7, compared with the baseline, all four control strategies improve the average vehicle speed in lane-1. The control-4 strategy (channelization & ramp metering) provides the largest increase of the average vehicle speed in lane-1 by 14.81%, from 29.16 km/h to 33.48 km/h. The following is ramp metering (the control-3 strategy) that leads to a 12.35% increase, greater than the results of two kinds of channelization. In addition, these four control strategies decrease the standard deviations of vehicle speeds in lane-1 when compared with the baseline. The control-4 strategy which combines channelization and ramp metering has the least value of the standard deviations of vehicle speeds.

This study performs t-tests to examine whether there are significant discrepancies of mean values of vehicle speeds under different types of control strategies in lane-1. If the value of a t-test is less than 0.05, the difference is significant. As shown in Figure 8(a), all the p values are less than 0.05, indicating the average vehicle speed of each control strategy is significantly different from the others. Thus, compared with other control strategies, the cooperative method (i.e., control-4 strategy) brings a significantly greater improvement in the average vehicle speed in lane-1 (all ).

(a)

(b)

(c)

(d)

(e)

(f)
Figures 8(b)–8(f) demonstrate the temporal-spatial distributions of the vehicle speeds in lane-1 under different control strategies. The color indicates the value of the vehicle speed. From blue to red, the vehicle speed increases. The dark blue means that the vehicle speed is 15 km/h, and the dark red means that the vehicle speed is 35 km/h. From Figures 8(b)–8(f), the color gradually changes from light yellow to dark red, which means that the vehicle speed is getting faster, and the traffic is becoming more efficient. As shown in Figure 8(f), the color of the temporal-spatial distribution is full of dark red, indicating that the minimum vehicle speed under the control-4 strategy (channelization & ramp metering) is approximately close to 35 km/h, which is higher than that of any other strategies. The color in Figure 8(e) is mostly a bright red, indicating the performance of ramp metering in lane-1 is worse than the cooperative method but better than the channelization.
3.2. Simulation Results in Lane-2
Figure 9 demonstrates the distributions of the vehicle speeds under control strategies in lane-2. In addition, t-tests are conducted to examine the discrepancies of the average vehicle speeds under different types of control strategies, and the results are illustrated in Figure 10(a). Compared with the baseline, four control strategies improve the traffic state (in terms of vehicle speeds) in lane-2 significantly (all ). Among these strategies, the cooperative method of channelization and ramp metering (i.e., the control-4 strategy) performs better than the separate methods. This cooperative method increases the average vehicles by 14.51%, from 22.32 km/h to 25.56 km/h, which is significantly higher than the increases in other control strategies (all ). The second-best performance belongs to the control-2 strategy (channelization right-left), which leads to a 9.68% increase in the average vehicle speed in lane-2. However, ramp metering (the control-3 strategy) has the least increase, only by 3.22%.


(a)

(b)

(c)

(d)

(e)

(f)
The temporal-spatial distributions of the vehicle speeds under different traffic control strategies are shown in Figures 10(b)–10(f). As the color changes from blue to red, the vehicle speed becomes faster. The colors in Figure 10(f) change around yellow, which are brighter than the colors of other figures, indicating that the vehicle speeds under the cooperative method (the control-4 strategy) are around 25 km/h, and they are generally larger than those under other control strategies. The cooperative method of channelization and ramp meter obviously improves the traffic state from the temporal-spatial scope, compared with other strategies. The colors in Figure 10(d) are close to bright green, while the colors in Figures 10(b) and 10(e) stay in the scope of blue to green. This indicates that the control-2 strategy (channelization right-left) performs the second best in improving the vehicle speeds in lane-2.
3.3. Simulation Results in Lane-3
According to the vehicle speed distributions under different traffic control strategies in lane-3 shown in Figure 11, the improvement of vehicle speeds in lane-3 is much higher than those in lane-1 and lane-2 for any strategies. With the use of the cooperative method of channelization and ramp metering (i.e., the control-4 strategy), the increase of the average vehicle speed in lane-3 is as high as 37.03%, from 19.44 km/h to 26.64%. The results of t-tests on the discrepancies of the average vehicle speeds under different control strategies are illustrated in Figure 12(a). It can be seen that the improvement of traffic states (in terms of the average vehicle speed) under the control-4 strategy is significantly higher than the others (all ) in lane-3. The next is the control-2 strategy (channelization right-left) with an increase by 31.48%. Compared with the baseline, the increases of the average vehicle speed in lane-3 under the control-3 (ramp metering) and control-1 strategies (channelization left-right) are 20.37% and 16.67%, respectively. In addition, the standard deviation of speeds under the control-3 and control-4 strategies is smaller than that under the other strategies.


(a)

(b)

(c)

(d)

(e)

(f)
The temporal-spatial distributions in lane-3 of vehicle speeds under different control strategies are demonstrated in Figures 12(b)–12(f). The color changes from blue to red, as the vehicle speed increases. The colors of the baseline in Figure 12(b) are around blue, indicating that the vehicle speeds change around 20 km/h. With the implementation of traffic control strategies, the colors of temporal-spatial distributions of vehicle speeds become closer to aqua, which shows these strategies do improve the traffic state in lane-3. The largest improvement is found in the control-4 strategies (channelization & ramp metering) with the colors around the bright yellow (see Figure 12(f)). In general, the colors in Figure 12(d) are warmer than those in Figures 12(c) and 12(e), indicating the vehicles drive faster under the control-2 strategy (channelization right-left) than under the control-3 (ramp metering) and control-1 strategies (channelization left-right).
3.4. Simulation Results in Lane-4 and Lane-5
As shown in Figures 13(a) and 13(b), in lane-4 and lane-5, the average vehicle speeds under different control strategies are all basically the same as that of the baseline (i.e., close to the real traffic situation, around 90 km/h). The differences between different control strategies are not significant based on the results of t-tests (all ). This might be due to the fact that the inner lanes are barely affected by the weaving phenomenon and are able to maintain a high vehicle speed.

(a)

(b)
4. Conclusions
This study aims to propose a reinforcement learning-based cooperative method of channelization and ramp metering to achieve automated traffic state optimization in the weaving area of urban expressways. After obtaining real traffic flow data from the UAV, four different control strategies (i.e., two kinds of channelization methods, a ramp metering method, and a cooperative method of channelization and ramp metering) and a baseline (without controls) are designed and simulated in the SUMO simulation platform by using TraCI. The results showed that all these methods can significantly improve the average vehicle speed in lane-1, lane-2, and lane-3, and the improvements in lane-3 are the largest compared with those in lane-1 and lane-2. The cooperative method of channelization and ramp metering is superior to other methods, with significantly higher increases in vehicle speeds. This cooperative method can increase the average vehicle speeds in lane-1, lane-2, and lane-3 by 14.51%, 14.81%, and 37.03%, respectively. In lane-4 and lane-5, no significant differences are found between the baseline and four control strategies, which might be because the weaving phenomenon has little effect on inner lanes in this study.
Findings in this study can contribute to the improvement of traffic efficiency and safety in the weaving area of urban expressways by a cooperative method. This reinforcement learning-based cooperative method significantly increases the vehicle speeds in the weaving area and reduces the effects of the weaving phenomenon. In addition, through the channelization, the safety in the weaving area can be enhanced since the traffic conflicts caused by lane changes are reduced. Ramp metering can control the vehicles on the ramp and reduce their conflicts with the traffic in the main lanes. In practice, traffic state on urban expressways is extracted in real time by surveillance cameras and detected by the YOLO algorithm. As for the difficulty of real-time transmission of video data, fifth-generation wireless (5 G) delivers higher speeds, lower latency, and higher reliability to ensure real-time transmission of video data. In terms of the recognition accuracy, existing methods can meet the requirements of reinforcement learning for the recognition rate of traffic state (e.g., vehicle speed here).
The limitations of this study are that the cooperative method only consists of two types of methods, and the traffic state is only estimated from the vehicle speed. In the future, more different kinds of traffic control methods (e.g., congestion pricing, variable speed limits, reducing the number of vehicles on the road by ridesplitting [32, 33]) and more traffic state indicators (e.g., average vehicle delay, waiting time) will be included in further analyses.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the submission of this manuscript.
Acknowledgments
This project was jointly supported by the National Natural Science Foundation of China (52102416), the Natural Science Foundation of Shanghai (22ZR1466000), and the project “Safety Control Design Index and Key Technology of Multi-Lane and Ultra-Wide Section Expressway.”