Abstract
Highway merging bottleneck is challenged with serious traffic conflicts between on-ramp and mainline vehicles, causing significant capacity drop and drastic speed changes. The paper proposes an adaptive coordinated variable speed limit model to manage highway speed of on-ramp and mainline continuous sections without priority to mainline. That helps to remove the speed difference between the vehicles from on-ramp and mainline flooding into the merging zone, and to sustain actual traffic density close to critical density to counteract capacity drop as indicated with macroscopic fundamental diagram. The method of deep reinforcement learning based on deep deterministic policy gradient is employed to solve the proposed model with a row of continuous control variables. Simulation platform with VISSIM 5.3 is established, and the proposed method can enhance traffic flow through the merging zone by around 10% and 19% under static and dynamic demand, respectively, in addition to reduced density and speed variation by around 30%. This research provides insights into the management of highway capacity so as to secure traffic efficiency and reliability for the merging zone.
1. Introduction
Highway congestion has been increasingly emphasized to prevent or alleviate its detrimental effect on traffic mobility, safety, and environment [1]. Highway congestion is mostly attributed to the merging or diverging around the on-ramps and off-ramps, traffic speed of which can be much lower than that of the mainline. Thus drastic speed changes can be caused, in addition to aggressive lane changing and weaving. That hinders the traffic flow on highway mainline, and causes recurrent highway bottlenecks. Considering that off-ramp are always closely related to the connected surface road, off-ramp bottleneck can be more complex [2]. Therefore, this research targets at on-ramp bottleneck, where vehicles frequently decelerate or stop to wait before squeezing onto the mainline.
Traffic of mainline and on-ramp can be coordinated with the combined strategies of variable speed limit (VSL) and ramp-metering so as to adjust highway flow into the merging zone. VSL issues dynamic speed limit with variable signs, depending on traffic conditions [3, 4]. Ramp-metering adjusts vehicle flow rate such that mainline density remains below the critical value to prevent traffic breakdown [5]. For example, VSL and ramp-metering are combined into model predictive control by incorporating dynamic speed limits on the mainline into the original on-ramp model of METANET [6], so as to address ramp-metering disadvantage of failing to prevent congestion upstream of the merging zone [7]. Papamichai et al. [8] combines VSL with ramp-metering in a discrete optimal control programming, where excessive on-ramp queue is treated with a penalty added to the objective. Assuming predetermined ramp-metering rate for on-ramp, a model predictive control is developed for VSL over a finite time horizon to address the recurrent highway on-ramp bottleneck, which is found to enhance traffic flow by 12.8% and reduce total travel time by 31.8% [9]. Then driver acceptance is taken into the coordination between VSL and ramp-metering upstream of and at the merging zone, which enhances bottleneck throughput significantly [10]. Recent research also attempts to introduce the connected and automated vehicles into the traffic control and management of highway merging zone, where microscopic vehicular trajectory is coordinated to enhance traffic efficiency [11, 12].
Despite rich research on the coordination of mainline VSL and ramp-metering, the existing study generally assigns priority to mainline at the cost of delaying or stopping on-ramp vehicles. Such management strategy has been found to account for nearly half of highway accidents at the congested merging zone, where rear-end crashes are most frequent due to drastic deceleration [13]. In addition, sideswipe and angle highway collisions are also found to be significantly related to the speed difference between highway mainline and on-ramp [14]. Statistical analysis has validated that increasing ramp vehicle volume assists in reducing crashes, while increasing mainline vehicle volume tends to increase crashes. Thus, it is important to enhance ramp traffic efficiency [15]. That is consistent to the early empirical research that suggests to remove the criteria of setting ramp design speed to be 50th percentile of freeway design speed, so as to allow more freedom to ramp vehicles [16].
Therefore, the research is encouraged to apply VSL to both on-ramp and mainline for their homogenization, instead of suppressing on-ramp and prioritizing mainline traffic. This highway management strategy is expected to alleviate the speed difference between mainline and on-ramp traffic, reducing on-ramp vehicle stops and shockwaves at the merging zone. That helps to maintain highway capacity and to reduce traffic fluctuations. To tune finely the variable speed limit, a set of continuous decision variables should be solved along the time horizon. Thus an efficient solution method is called for that is capable of finding the optimal control scheme under traffic dynamics.
Recently, artificial intelligence has attracted increasing attention, where intelligent agents are defined to perceive environment, predict system evolution, and take proactive actions for maximal reward [17]. Advanced application of artificial intelligence includes the improvement of object tracking against occlusion [18], image inpainting against structure disconnecting with improved total variation minimization [19] or structural disorder with multilevel attention progression mechanism [20], and multiscale superresolution image with feature map attention mechanism [21]. When it comes to highway management, reinforcement learning is widely adopted for complex modelling solution. For example, a reinforcement learning based VSL control is proposed to reduce total travel time near a freeway recurrent bottleneck [22] and to smooth vehicle conflicts for reduced crashes [23]. Single agent Q learning was implemented on ramp-metering, which is found to reduce total network travel time by 17% compared to ALINEA algorithm [24]. Q-learning was also applied to VSL optimization to reduce total travel time of both mainline and on-ramp by 49.3% and 21.8% under stable and dynamic traffic demand, respectively [22]. Further, deep Q-learning is applied to VSL of highway mainline with on-ramp bottlenecks, reducing total travel time by 26% to 67% with stable demand, and 21% to 70% with dynamic demand [25]. To select continuous actions instead of discrete ones, deep deterministic policy gradient (DDPG) is proposed to learn competitive policies [26]. For example, updated research has applied DDPG learning model to VSL control against spatially dynamic speed limit zones based on vehicle position and speed [27].
Therefore, research gap is identified on highway merging zone management, where previous research assigns priority to mainline over on-ramps, causing traffic shocks and conflicts upstream of the bottleneck area. To this end, the research proposes an adaptive coordinated VSL (ACVSL) to explore the potential of applying VSL to highway mainline and on-ramp, guiding vehicles of both to adjust to the same speed before the merging zone [28]. The proposed speed limit is deduced from fundamental diagram with traffic density close to the critical density that corresponds to road capacity. Thus capacity drop at the merging zone can be reduced under the objective of maximal vehicle throughput, to which the emerging solution method of deep reinforcement learning is employed for efficient control scheme.
For this paper, the main contributions are as follows. (1) a novel ramp management strategy is proposed to remove mainline priority over on-ramp, where both are managed with VSL in an adaptive coordinated way. Thus equal speed can be achieved between mainline and on-ramp immediately upstream of the merging zone. That helps to alleviate traffic shockwaves at the merging bottleneck and to anticipate capacity drop. (2) Error state is structured to reflect the real-time difference between actual and critical traffic density, based on which speed control parameters are developed to update VSL of both mainline and on-ramp. Thus traffic density at the merging zone can be stabilized around the critical density to counteract road capacity drop. (3) DDPG-based deep reinforcement learning is established to solve the proposed model with continuous control parameters, which is calibrated and validated with simulation platform for enhanced vehicle throughput and alleviated speed fluctuations.
The remaining of the paper is organized as follows. Section 2 establishes ACVSL model for the continuous highway sections of mainline and on-ramp till the merging bottleneck. It develops nonlinear feedback to narrow the difference between actual and critical traffic density to sustain bottleneck capacity. Section 3 develops the deep reinforcement learning method with DDPG to solve the proposed ACVSL under critic-actor framework. Simulation and tests follow in Section 4, validating that the proposed model and solution method can significantly enhance the efficiency of highway merging. Section 5 briefly concludes the paper, and points out future research directions.
2. Mathematical Modelling
Figure 1 shows a typical highway section with on-ramp. The mainline and on-ramp are divided into and sections, respectively. Considering the limited ramp length, we set . VSL on the upstream sections controls the vehicles getting into the next section, and finally into the merging zone.

Objective of the control model is given by: so as to maximize vehicle throughput from the merging zone, and to retain highway capacity. Parameter represents the total flow through the merging bottleneck at time , and means flow on the last section of at time , where and refer to the mainline and on-ramp, respectively.
As indicated by the studies of [2, 29], lower speed limit can increase the critical density to a higher value. Figure 2 shows that the speed lower than the free-flow speed (i.e. ) assists in shifting the critical density to a higher value (i.e. ) in the fundamental diagram without decreasing bottleneck capacity. Moreover, the lower speed causes the slope of the under-critical part to be decreased, making the fundamental diagram get closer to a straight line, i.e. the red dash line vs. green dash line on the left-side, to alleviate the speed deviation around the critical density and the resultant shockwaves. Therefore, we may design an adaptive coordinated VSL control on the continuous sections of highway mainline and on-ramp, letting the traffic density at the merging section converge to the critical density derived from fundamental diagram. Thus the region of fluent traffic for the merging zone can be extended in comparison to the no-control scenarios, helping to attenuate shockwaves, and to sustain bottleneck capacity.

To this end, we define an error state to describe the deviation of the actual traffic density from its critical value, given by:
For the sections upstream of and at the merging zone, respectively, at time step on highway . Parameter means the actual traffic density on section of road at time step . Parameter represents the actual density of the merging zone at time step , while describes the critical density between fluent and congested traffic flow, and is assumed to be constant.
Now we have developed an open-loop control system joining both mainline and on-ramp. In the following, the nonlinear feed-back mechanism for adaptive coordinated variable speed limit (ACVSL) is established.
Control parameters aim to adjust the guide speed on the VSL signs upstream of the merging zone, given by: so as to maintain the density of the merging zone around the critical value, and to counteract capacity drop. Notation represents the control parameter on section of highway at time step (), describes the traffic flow of section , means the intended guide speed that will be demonstrated on the VSL sign at time step (), and represents the length of section on . Therefore, on the right-side of Eq. (4), the numerator is the flow difference between section and section , and the denominator is the vehicle count in section required to fill its density gap from the critical density at time of highway [30]. Eq. (4) can be reformulated to represent the intended guide speed , given by:
Thus the gap between actual density and critical density can be alleviated to promote efficient and stable traffic flow. Therefore, once control parameter is determined, the intended guide speed can be calculated with Eq. (5).
When it comes to the last sections either on the mainline or on-ramp, VSL there is set the same as that of the merging zone to secure smooth convergence, given by: where means the theoretical speed of the merging zone from macroscopic fundamental diagram at critical density [30]. Thus speed conflicts can be alleviated between highway mainline and on-ramp immediately upstream of the merging zone.
Additionally, to avoid dynamic oscillation in speed limit, the speed design by Zhang and Ioannou [31] is borrowed, given by: where is the maximum speed limit that can be demonstrated on VSL sign for section of highway at step . Parameter rounds to its closest multiple of 5, represents the maximum speed difference in VSL between successive control steps and highway sections; is the upper bound of VSL; and returns the final speed delivered to VSL sign for section of highway after considering the lower bound of speed limit, with being the lower bound of VSL.
Thus the optimization model ACVSL is summarized with the object of Equation (1) and constraints of Equations (2)–(8), by optimizing a set of control parameters . Table 1 summarizes the notation of the proposed model.
3. Solution Method
In this section, we target at optimizing the vector of control parameters from Eq. (4), a set of multi-dimensional continuous variables. The control parameters indicate the ratio of the flow difference to the gap between current density and critical density for section of highway at time , determining the VSL values in turn as in Eq. (5), so as to adjust the count of vehicles moving downward. Thus we may coordinate VSL of both mainline and on-ramp so as to adapt to real-time traffic state with optimized control parameters , sustaining critical traffic density that is beneficial to bottleneck flow.
DDPG-based deep reinforcement learning method first collects traffic state , given by: where and refer to the vectors of traffic speed and density over highway sections at time , which are of different physical units and normalized for effective learning, given by: with and being free flow speed and jam density on highway section , respectively.
The proposed method entails the training of a software-controlled agent to take action in response to the current system state for reward . Action sets the value of the control parameter at the beginning of each control time step, given by: where means the upper bound of control variables for normalization to secure the action variables in the interval , a necessity for the input of neural network. Reward function aims to maximize highway bottleneck outflow, given by: which is divided by the multiplication of and for normalization, with and being bottleneck capacity and step interval length, respectively. Reward at each step is achieved via actor network, which actually refers to control policy , a function that maps system state to probability distribution with control actions. Critic network then develops a corresponding Q-function to represent the accumulated discounted reward if action is taken for state and strategy is followed at step onwards, given by: where represents the value obtained under strategy , and is a discount factor of interval . Thus the accumulated reward is kept a finite quantity where the temporally-proximate rewards are more heavily weighted than are the distant ones. To obtain the optimum policy for maximum , the proposed network is customized for parameterized actors and critics, i.e. and .
Specifically, the training epochs tune the actors and critics alternately, first by updating the function approximators to satisfy Equation (14), and then by updating with a policy gradient defined by:
Transition pair is stored in the replay memory at the end of each control step and is sampled uniformly at random for state update. Figure 3 illustrates the actor-critic framework employed to establish the relationships between traffic states, action agents, and reward function, so as to address the problem with continuous action space of parameter . Referring to the limited dimension of input , i.e. 2 , one hidden layer is constructed for both critic and actor network.

Note: DDPG-based deep reinforcement learning method adopts exploration epochs before training. That is to overcome local minimum, a major challenge of learning in continuous action spaces. Thus the procedure of exploration is treated independently from the learning algorithm, so as to globally explore for the optimal strategy. Algorithm 1 summarizes the proposed solution method.
|
4. Simulation and Tests
In this section, we design and evaluate the performance of the proposed ACVSL model and solution method for a highway bottleneck with on-ramp. After validation, the proposed method is applied to extensive scenarios with varying traffic demand.
4.1. Set-up
The proposed ACVSL model and solution algorithm are developed and tested with MATLAB 2020, which is connected to simulation platform VISSIM 5.3 for traffic states collection. Specifically, the files of Layout.ini and Net.inp input road network information and other traffic parameters into VISSIM 5.3, respectively. Figure 4 shows the workflow, where the controller observes the traffic state and reward from the environment of VISSIM simulator at each control step and returns the control parameters to the simulator with COM interface. Note the first 20-min simulation is ignored to allow vehicles to fill the convergence zone.

Highway mainline has 3 lanes, while on-ramp has a single lane. Assuming default traffic parameters of driving behaviour and traffic composition in VISSIM platform, capacity of mainline and on-ramp is tested and set 2000 and 1500 veh/h/ln, respectively. Other parameters of the proposed ACVSL model are summarized in Table 2. Control step is set 2 min (i.e. min) for timely VSL update. That is, VSL signs are updated every 2 min so as to delicately control traffic state dynamics. Control parameter is specified in the range of [0,100], i.e. the action space.
Meta-parameters of DDPG-based deep reinforcement learning are selected from repeated experiment, and the set with the best performance is adopted. Reward discount factor is set 0.85. Update probability during the training period is 20%. The size of replay memory and mini-batch is set 4000 and 400, respectively.
4.2. Training
We consider the following two scenarios to train the proposed method of DDPG-based deep reinforcement learning for the developed ACVSL model. (1)The scenario with static demand, which is set 5400 and 1400 veh/h for mainline and on-ramp, respectively(2)The scenario with dynamic demand, which follows the curve in Figure 5 for mainline and on-ramp, respectively. That is, demand increases and decreases for the first and last 60 min, respectively. Sample data employed in the training process is summarized in Table 3

Each epoch of both exploration and training persists 2-hour simulation windows. With 2-minute control interval, each epoch of the exploration and training procedure is thus divided into 60 steps. A total of 30 epochs are adopted in the exploration, while 30 and 70 epochs in the training for the scenarios with static demand and dynamic demand, respectively, where learning rate is set 0.01 and 0.02, due to the difficulty of capturing demand changes in the latter scenario.
Figure 6 shows the outcomes of exploration and training epochs, as measured by the resulting bottleneck flow vs. epoch number under static and dynamic demand, respectively. In the exploration epoch, the series of control parameters is stochastically selected without optimization. Thus the bottleneck volume varies greatly. In comparison, in the training epoch, control parameters are iteratively updated to tune actors and critics alternatively, where bottleneck volume exceeds that with the best control scheme from exploration epochs (i.e. EX-ACVSL) at the 7th and 8th epoch under the static and dynamic scenario, respectively. The optimal control schemes from the proposed deep reinforcement learning (i.e. DRL-ACVSL) are obtained at the 24th and 65th epoch, beating the best control scheme from the exploration epoch EX-ACVSL by 13% and 19%, respectively.

(a)

(b)
4.3. Test and Calibration
Test and calibration of the trained controller entails a 2 h simulation under static and dynamic scenarios, respectively. The static case is characterized with the demand of 5500 veh/h on the mainline and 1500 veh/h on the ramp. The dynamic case deals with the demand given in Figure 7, data trend of which input to the proposed ACVSL method is similar to that of Table 3.

The proposed scheme from the developed method DRL-ACVSL is compared with the strategies that adopts the optimal control parameters from the exploration epochs (i.e. EX-ACVSL), and that of without control (i.e. Do-nothing). Table 4 summarizes the performance of DRL-ACVSL scheme from training together with EX-ACVSL selected from the exploration epoch as well as the Do-nothing strategy. It is observed that DRL-ACVSL scheme competes with EX-ACVSL and Do-nothing with higher and more stable traffic flow at the merging zone, where the variation of traffic density and speed is also reduced to promote stable traffic flow. Specifically, average bottleneck flow is enhanced with DRL-ACVSL strategy by 7% and 10% under static demand, while by 14% and 19% under dynamic demand, compared to EX-ACVSL and Do-nothing strategies, respectively. Such enhancement competes the existing research that reports flow increase by 4.7% [32]. DRL-ACVSL competence is also reflected with the fact that standard deviation of traffic density is reduced by 17% and 27% under static vs. 23% and 32% under dynamic demand and standard deviation of traffic speed is reduced by 25% and 33% under static vs. 9% and 9% under dynamic demand against EX-ACVSL and Do-nothing strategies, respectively.
Figure 8 demonstrates the dynamics of traffic flow for highway merging bottleneck under three strategies, it is observed that DRL-ACVSL scheme manages to smooth vehicle outflow and keep it at a higher level than that with EX-ACVSL scheme, though the latter also promotes stable vehicle throughput compared to the Do-nothing scheme. Moreover, DRL-ACVSL performs better under static demand than dynamic demand, bringing higher vehicle throughput, and better smoothing flow fluctuations. Another interesting finding is that, under dynamic demand, DRL-ACVSL manages to enhance traffic flow compared to the static scenario, because the changing demand allows better exploitation of highway capacity with gradually increasing or decreasing demand, instead of crowding into the bottleneck constantly.

(a)

(b)
Traffic density in Figure 9 shows that DRL-ACVSL is efficient in reducing and stabilizing traffic density at the merging zone under static scenario. In this case, DRL-ACVSL competes with EX-ACVSL and Do-nothing strategies, where significant ups and downs in traffic density are frequently observed. When it comes to the dynamic case, the advantage of DRL-ACVSL reducing density dynamics is kept, though density comes to a higher level compared to EX-ACVSL strategy. Therefore, DRL-ACVSL is validated to keep stable traffic density and reduce traffic shocks, promoting traffic efficiency and safety. Note traffic density is also related to the error state (see Figure 10) between actual and critical traffic density, where the latter is set constant based on fundamental diagram with speed limit. Thus Figure 10 shows similar trend to that of Figure 9.

(a)

(b)

(a)

(b)
Figure 11 comes to traffic speed of the merging zone. It is observed that DRL-ACVSL can enhance traffic speed and reduce speed oscillation to a higher degree than EX-ACVSL, while the Do-nothing strategy performs least satisfying. Moreover, the gap between DRL-ACVSL and EX-ACVSL is enlarged when it comes to the subsequent steps, showing that the proposed solution can well trace traffic dynamics under static demand. When it comes to the dynamic scenario, EX-ACVSL outperforms DRL-ACVSL with higher speed, though DRL-ACVSL manages to further reduce speed deviation. The Do-nothing strategy is still the worst with respect to speed average and variation.

(a)

(b)
4.4. Numerical Analyses
Extensive numerical analyses are conducted for the proposed method under variable traffic demands. Figure 12 summarizes the results from the numerical analysis, where dynamic demand increases from half static demand to one and half at the same difference for the first hour, and decreases in the same style for the second hour. It is observed that the proposed DRL-ACVSL is efficient in enhancing bottleneck throughput against varying traffic demand. Specifically, with static demand, DRL-ACVSL always competes with EX-ACVSL and Do-nothing strategies, with DRL-ACVSL advantages strengthened when on-ramp demand increases, though the latter to a lesser degree. In contrast, the Do-nothing strategy brings decreased throughput when on-ramp demand increases, implicating the intensified conflicts between mainstream and on-ramp vehicles. Similar trend is observed under dynamic demand, except that vehicle throughput is generally at a higher level compared to static scenarios. Thus the proposed model and solution method is validated to be capable of adapting to various traffic conditions for enhanced bottleneck efficiency.

Thus the proposed method has the potential to better enhance the capacity of the merging zone with increasing on-ramp demand. That can be explained with on-ramp vehicle density getting closer to the intended critical density to well sustain capacity of the merging zone for maximum throughput. In comparison, the increased demand of mainline decreases the vehicle flow of the merging zone. The findings are consistent to the previous research that increasing on-ramp flow and reducing mainline flow contributes to enhance bottleneck capacity [15] with a new attempt to coordinate the variable speed limits of both mainline and on-ramp.
5. Conclusions
This paper proposes an adaptive coordinated variable speed limit (ACVSL) control strategy for highway bottleneck with on-ramp, where the priority to mainline over on-ramp is removed to alleviate the speed difference upstream of the merging zone and to improve highway merging efficiency. In this endeavour, error state is developed to indicate the gap between the actual and critical traffic density, with the latter derived from fundamental diagram that corresponds to road capacity. Thus the smaller the error state, the higher traffic flow, and the shifting can be reduced between fluent and congested traffic states. In the following, nonlinear feedback mechanism is established to adjust speed limits along the control horizon.
To solve the proposed model, deep reinforcement learning algorithm is developed, which is trained with DDPG using a critic-actor network to finely tune the continuous control variables. With traffic state collected and control reward evaluated, exploration and training epochs search the feasible domain and establish the control scheme. Experiment platform with VISSIM is then employed to test the proposed method, followed with numerical analyses. After calibration and training, extensive scenarios are established for method validation. It is found that the proposed scheme is capable of enhancing traffic flow through the bottleneck by 10% and 19% under static and dynamic demand, respectively, compared to the Do-nothing strategy. That competes the existing literature with respect to the improvement in vehicle throughput, validating the efficiency of the proposed model. Moreover, the proposed scheme manages to reduce the variation of traffic density and speed by around 30%, helping to stabilize traffic states and reduce traffic shockwaves. Extensive numerical analyses further confirm the advantage of the proposed method to improve bottleneck capacity especially when on-ramp demand increases.
This research adds to the existing literature with coordinated speed guide between mainline and on-ramp, providing insights for responsive agencies into highway on-ramp bottleneck management that prioritize mainline over ramps can be unnecessary. Thus highway merging bottleneck can be better addressed to relieve congestion and to reduce accident. Limitation of the research is threefold. First, the performance in pollution emissions and fuel consumption should be further explored. Second, detailed analysis has not been scratched on how mainline and on-ramp vehicles cooperate for acceptable gap and smooth merging. Third, the effect of acceleration lane at the merging zone on traffic states has not been considered. The on-going research is to extend the proposed method to include more ramps so as to explore the potential of adaptive coordinated variable speed limit for consecutive highway ramps.
Data Availability
The data that supports the findings of this study is available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare no conflict of interests.
Acknowledgments
This work has been supported by the National Natural Science Foundation of China [Grant No. 52002261] and the Natural Science Research of Jiangsu Higher Education Institutions of China (Grant No. 20KJB580011). The authors also thank PhD student Yanpei Zhang for his help in the manuscript format in the Joint Laboratory of Future Transport.