Automated Traffic State Optimization in the Weaving Area of Urban Expressways by a Reinforcement Learning-Based Cooperative Method of Channelization and Ramp Metering
Table 2
The procedure of the Q-learning algorithm.
Input: state (), action (), reward (), and algorithms
Output: The best policy
Randomly initialize the state-action value function and target state-action value function , and the state ()
For episode = 1, …, do
Initialize the state () (obtaining a ramp signal phase and an average vehicle speed of all main lanes)
While all episodes:
Choose a policy (using the ε-greedy exploration)
Choose action () (setting the ramp signal to red or green)
Get a reward
Get new state () (obtaining a new ramp signal phase and a new average vehicle speed of all main lanes)
Update and state ():
End for
End for
Where denotes the learning rate and is equal to 0.1 in this study; denotes the discount factor and is equal to 0.9