Research Article
Exploration for Countering the Episodic Memory
Algorithm 2
Exploration for countering neural episodic control.
(1) | Initialize replay memory | (2) | Initialize a DND for each action a | (3) | Initialize for horizon of the N-step Q rule | (4) | for episode = 1 to do | (5) | for t = 1 to do | (6) | Obtain observation from the environment with embedding | (7) | Estimate for each action a via (2) from | (8) | if Satisfy (4) then | (9) | Choose | (10) | else | (11) | Choose | (12) | end if | (13) | Execute action , and receive reward | (14) | Append to | (15) | Append to | (16) | Train a random minibatch in | (17) | end for | (18) | end for |
|