Research Article
Exploration for Countering the Episodic Memory
Algorithm 2
Exploration for countering neural episodic control.
| (1) | Initialize replay memory | | (2) | Initialize a DND for each action a | | (3) | Initialize for horizon of the N-step Q rule | | (4) | for episode = 1 to do | | (5) | for t = 1 to do | | (6) | Obtain observation from the environment with embedding | | (7) | Estimate for each action a via (2) from | | (8) | if Satisfy (4) then | | (9) | Choose | | (10) | else | | (11) | Choose | | (12) | end if | | (13) | Execute action , and receive reward | | (14) | Append to | | (15) | Append to | | (16) | Train a random minibatch in | | (17) | end for | | (18) | end for |
|