Research Article

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Algorithm 1

The DN-DDPG process.
Input:
Output:
(1)Randomly initialize the actor-critic network for their parameters , and ,
(2)Initialize the target network and , and copy the online network parameters to the target network
(3)Initialize the experience playback buffer D, noise coefficient , and discount rate
(4)Set up external loop, the round number = 1, M
(5)Initialize State S as the current state, and obtain the start state
(6)Set up internal loop, the round number = 1, T
(7)Select action
(8)Conduct action , and obtain the reward and the new state
(9)Save the experience data (, , , ) in an experience pool
(10)Randomly select a certain number of samples (, , , ) from the experience pool
(11)Calculate the target value Q:
(12)Calculate the square error of the loss function and update the critic network:
(13)Update the actor network via the gradients of the sample data:
(14)Regularly update the parameters of the target network:
(15)End internal loop
(16)End external loop.