Research Article

Reinforcement Learning Guided by Double Replay Memory

Algorithm 1

Double experience replay (DER).
Given: :
 An off-policy RL algorithm 𝔸, where 𝔸: DQN
 Sampling strategies (π•Š1,π•Š2) from replay
 where: π•Š1 : uniform sampling, π•Š2 TD-error based sampling
 an update probability strategy for update second replay, where
Initialize 𝔸
Initialize replay buffer ,
 observe S0 and choose a0 using 𝔸
  for episode =1, M do
   observe
    store transition in to follow
    for t=1; T do
     if N2>k then
      With π•Š1,π•Š2, sampling ratio Ξ»
      sample transition from and
     else
      With π•Š1, sample transition from H1
     update weight according to 𝔸
     put used transition into with probability
     if transitions from then
      update according to
until converge