Research Article

Reinforcement Learning-Based Multiple Constraint Electric Vehicle Charging Service Scheduling

Algorithm 1

Policy gradient algorithm
(1)In the neural network, initialize the parameter set randomly and initialize .
(2)Initialize , randomly initialize action and output state , calculate local reward , and then add the trajectory generated by the action to the stored trajectory of the training.
(3)Input state to the neural network and select a random action .
(4)After the simulation environment executes action , obtains the output state , and calculates the local reward , the trajectory generated by the action is added to the stored trajectory of the training.
(5)Judge whether is true; if it is true, go to step 6; otherwise, assign to and go to step 3, where is the variable to be accumulated and is the expected value of the total reward for a single trajectory.
(6)Calculate the strategy optimization strategy function .
(7)Assign to , update the parameter set in strategy to , and judge whether is true; if so, go to step 2; otherwise, the reinforcement learning training process is over; save the updated parameter set as the most optimal parameter set and the optimal strategy ; is the maximum number of trajectories