Research Article

Automated Traffic State Optimization in the Weaving Area of Urban Expressways by a Reinforcement Learning-Based Cooperative Method of Channelization and Ramp Metering

Table 2

The procedure of the Q-learning algorithm.

Input: state (), action (), reward (), and algorithms

Output: The best policy
Randomly initialize the state-action value function and target state-action value function , and the state ()
For episode = 1, …, do
Initialize the state () (obtaining a ramp signal phase and an average vehicle speed of all main lanes)
While all episodes:
   Choose a policy (using the ε-greedy exploration)
   Choose action () (setting the ramp signal to red or green)
   Get a reward
   Get new state () (obtaining a new ramp signal phase and a new average vehicle speed of all main lanes)
   Update and state ():
   
   
  End for
End for
Where denotes the learning rate and is equal to 0.1 in this study; denotes the discount factor and is equal to 0.9