Research Article

Learning to Drive in the NGSIM Simulator Using Proximal Policy Optimization

Algorithm 1

PPO.
Input: Randomly initialize the parameters of the Actor-Critic as , the initial learning rate
For  = 0 to , repeat the following steps
 Using the policy to interact with the NGSIM environment for steps, record the trajectories of the agent as , calculate the reward according to equation (10) for every state in the trajectories.
 Compute advantage using GAE.
 Compute the gradient according to equation (7) with epochs and minibatch size , and update using Adam optimizer.
 Linearly decay the value of the learning rate
End