Mathematical Problems in Engineering

Research Article

Reinforcement Learning-Based Multiple Constraint Electric Vehicle Charging Service Scheduling

Policy gradient algorithm

(1)	In the neural network, initialize the parameter set randomly and initialize .
(2)	Initialize , randomly initialize action and output state , calculate local reward , and then add the trajectory generated by the action to the stored trajectory of the training.
(3)	Input state to the neural network and select a random action .
(4)	After the simulation environment executes action , obtains the output state , and calculates the local reward , the trajectory generated by the action is added to the stored trajectory of the training.
(5)	Judge whether is true; if it is true, go to step 6; otherwise, assign to and go to step 3, where is the variable to be accumulated and is the expected value of the total reward for a single trajectory.
(6)	Calculate the strategy optimization strategy function .
(7)	Assign to , update the parameter set in strategy to , and judge whether is true; if so, go to step 2; otherwise, the reinforcement learning training process is over; save the updated parameter set as the most optimal parameter set and the optimal strategy ; is the maximum number of trajectories