Journal of Advanced Transportation

Research Article

Multilane Spatiotemporal Trajectory Optimization Method (MSTTOM) for Connected Vehicles

Q-learning algorithm.

	Q-learning algorithm
(1)	Initialize q (s, a);
(2)	While ()
(3)	{Select the initial state s₀ and action a₀ according to the ε-greedy strategy;
(4)	While ()
(5)	{Select the action a_t the state s_t according to the ε-greedy strategy, get reward r_t and the next state s_t+1;
(6)	;
(7)	; } }
(8)	Get the optimal strategy