Journal of Advanced Transportation

Research Article

Learning to Drive in the NGSIM Simulator Using Proximal Policy Optimization

PPO.

	Input: Randomly initialize the parameters of the Actor-Critic as , the initial learning rate
	For = 0 to , repeat the following steps
	Using the policy to interact with the NGSIM environment for steps, record the trajectories of the agent as , calculate the reward according to equation (10) for every state in the trajectories.
	Compute advantage using GAE.
	Compute the gradient according to equation (7) with epochs and minibatch size , and update using Adam optimizer.
	Linearly decay the value of the learning rate
	End