Research Article
Multilayer Deep Deterministic Policy Gradient for Static Safety and Stability Analysis of Novel Power Systems
(1) | Randomly initialize the parameters of policy network and of the target network. | (2) | Randomly initialize the parameters of the policy target network and of the target network. | (3) | Randomly initialize the experience replay matrix. | (4) | Execute the platform corresponding to the environment by the initial action in one step. | (5) | Obtain initial state from the environment. | (6) | For i from 1 to maximum iteration N | (7) | Obtain the actions from all policy networks through the received state . | (8) | Obtain new action by summing the actions output from all policy networks and the agent performs the action based on the received state . | (9) | Obtain reward value and next state from the environment. | (10) | Deposit the quadratic array of trajectories () of the agent into the experience replay matrix. | (11) | Update sampling priority. | (12) | Randomly sample M samples from the experience replay matrix and calculate the current target value . | (13) | Calculate TD error and TD target. | (14) | Update the parameters and of policy and value networks by gradient ascent and descent, respectively. | (15) | Set a hyperparameter , update the parameters and of target policy and target value networks by weighted average. | (16) | End for | (17) | Save the trained model/networks. |
|