Research Article

Adaptive Coordinated Variable Speed Limit between Highway Mainline and On-Ramp with Deep Reinforcement Learning

Algorithm 1

DDPG Algorithm.
Initialize critic network and actor network
Initialize target critic network and target network
Initialize prioritized experience replay memory R
While not converge do
  Observe system state
  Select action for the current state following -greedy exploration policy
  Observe transition pair , and store in R
  Sample a mini-batch of size N from R
  Set
  Update critic network by minimizing:
    
  Update actor with the sampled policy gradient:
    
  Update target network