Journal of Sensors

Research Article

Reinforcement Learning Guided by Double Replay Memory

Double experience replay (DER).

Given: :
An off-policy RL algorithm 𝔸, where 𝔸: DQN
Sampling strategies (𝕊₁,𝕊₂) from replay
where: 𝕊₁ : uniform sampling, 𝕊₂ TD-error based sampling
an update probability strategy for update second replay, where
Initialize 𝔸
Initialize replay buffer ,
observe S0 and choose a0 using 𝔸
for episode =1, M do
observe
store transition in to follow
for t=1; T do
if N2>k then
With 𝕊₁,𝕊₂, sampling ratio λ
sample transition from and
else
With 𝕊₁, sample transition from H1
update weight according to 𝔸
put used transition into with probability
if transitions from then
update according to
until converge