Journal of Advanced Transportation

Research Article

Adaptive Coordinated Variable Speed Limit between Highway Mainline and On-Ramp with Deep Reinforcement Learning

DDPG Algorithm.

Initialize critic network and actor network
Initialize target critic network and target network
Initialize prioritized experience replay memory R
While not converge do
Observe system state
Select action for the current state following -greedy exploration policy
Observe transition pair , and store in R
Sample a mini-batch of size N from R
Set
Update critic network by minimizing:

Update actor with the sampled policy gradient:

Update target network