Research Article
Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
Algorithm 1
Bootstrapped and aggregated multi-DDPG (BAMDDPG).
| | Randomly initialize main critic networks and main actor networks with weights and | | | Initialize target networks and with weights | | | Initialize centralized experience replay buffer | | | for episode = 1, M do | | | Initialize an Ornstein–Uhlenbeck process for action exploration | | | if #Env == 1 do | | | Alternately select and among multiple DDPGs to interact with the environment | | | else do | | | Select all and , each DDPG is bound with one environment | | | end if | | | for do | | | for #selected DDPG do | | | Receive state from its bound environment | | | Execute action and observe reward and new state | | | Store experience in | | | end for | | | for do | | | Update , , , and according to equations (4)–(6) | | | end for | | | end for | | | end for | | | Get final policy by aggregating subpolicies: |
|