International Transactions on Electrical Energy Systems

Research Article

Multilayer Deep Deterministic Policy Gradient for Static Safety and Stability Analysis of Novel Power Systems

Training steps of MDDPG.

(1)	Randomly initialize the parameters of policy network and of the target network.
(2)	Randomly initialize the parameters of the policy target network and of the target network.
(3)	Randomly initialize the experience replay matrix.
(4)	Execute the platform corresponding to the environment by the initial action in one step.
(5)	Obtain initial state from the environment.
(6)	For i from 1 to maximum iteration N
(7)	Obtain the actions from all policy networks through the received state .
(8)	Obtain new action by summing the actions output from all policy networks and the agent performs the action based on the received state .
(9)	Obtain reward value and next state from the environment.
(10)	Deposit the quadratic array of trajectories () of the agent into the experience replay matrix.
(11)	Update sampling priority.
(12)	Randomly sample M samples from the experience replay matrix and calculate the current target value .
(13)	Calculate TD error and TD target.
(14)	Update the parameters and of policy and value networks by gradient ascent and descent, respectively.
(15)	Set a hyperparameter , update the parameters and of target policy and target value networks by weighted average.
(16)	End for
(17)	Save the trained model/networks.