Research Article
Reinforcement Learning Guided by Double Replay Memory
Algorithm 1
Double experience replay (DER).
Given: : | βAn off-policy RL algorithm πΈ, where πΈ: DQN | βSampling strategies (π1,π2) from replay | βwhere: π1 : uniform sampling, π2 TD-error based sampling | βan update probability strategy for update second replay, where | Initialize πΈ | Initialize replay buffer , | βobserve S0 and choose a0 using πΈ | ββfor episode =1, M do | βββobserve | ββββstore transition in to follow | ββββfor t=1; T do | βββββif N2>k then | ββββββWith π1,π2, sampling ratio Ξ» | ββββββsample transition from and | βββββelse | ββββββWith π1, sample transition from H1 | βββββupdate weight according to πΈ | βββββput used transition into with probability | βββββif transitions from then | ββββββupdate according to | until converge |
|