Computational Intelligence and Neuroscience

Research Article

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

The DN-DDPG process.

	Input:
	Output:
(1)	Randomly initialize the actor-critic network for their parameters , and ,
(2)	Initialize the target network and , and copy the online network parameters to the target network
(3)	Initialize the experience playback buffer D, noise coefficient , and discount rate
(4)	Set up external loop, the round number = 1, M
(5)	Initialize State S as the current state, and obtain the start state
(6)	Set up internal loop, the round number = 1, T
(7)	Select action
(8)	Conduct action , and obtain the reward and the new state
(9)	Save the experience data (, , , ) in an experience pool
(10)	Randomly select a certain number of samples (, , , ) from the experience pool
(11)	Calculate the target value Q:
(12)	Calculate the square error of the loss function and update the critic network:
(13)	Update the actor network via the gradients of the sample data:
(14)	Regularly update the parameters of the target network:
(15)	End internal loop
(16)	End external loop.