Mathematical Problems in Engineering

Research Article

Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Bootstrapped and aggregated multi-DDPG (BAMDDPG).

	Randomly initialize main critic networks and main actor networks with weights and
	Initialize target networks and with weights
	Initialize centralized experience replay buffer
	for episode = 1, M do
	Initialize an Ornstein–Uhlenbeck process for action exploration
	if #Env == 1 do
	Alternately select and among multiple DDPGs to interact with the environment
	else do
	Select all and , each DDPG is bound with one environment
	end if
	for do
	for #selected DDPG do
	Receive state from its bound environment
	Execute action and observe reward and new state
	Store experience in
	end for
	for do
	Update , , , and according to equations (4)–(6)
	end for
	end for
	end for
	Get final policy by aggregating subpolicies: