Research Article
Exploration for Countering the Episodic Memory
Algorithm 1
Exploration for countering model-free episodic control.
(1) | for episode = 1 to do | (2) | for t = 1 to do | (3) | Obtain observation from the environment | (4) | Let | (5) | Estimate and Q for each action a via (3) | (6) | if Satisfy (4) then | (7) | Choose | (8) | else | (9) | Choose | (10) | end if | (11) | Execute action , and receive reward | (12) | end for | (13) | for t = to 1 do | (14) | Update and according to (2) | (15) | end for | (16) | end for |
|