Research Article
An Empirical Investigation of Transfer Effects for Reinforcement Learning
Algorithm 1
The Q-learning based algorithm for the sorting task. RL_Sort.
| input: Straining, Qn[Sn, An] | | (1) | initialize | | (2) | upper_bound = n + 1 | | (3) | train_steps = 0 | | (4) | success_rate = 0.75 | | (5) | Sgoal = [1, 2, ..., n] | | (6) | repeat | | (7) | end = FALSE | | (8) | swap_times = 0 | | (9) | s = Straining | | (10) | current_rate = 0 | | (11) | repeat | | (12) | Select an action a based on ε-greedy | | (13) | Perform the action a and observe s′ and the corresponding reward | | (14) | swap_times = swap_times + 1 | | (15) | if (s′ is S_goal) then | | (16) | Qn[s, a] ⟵ Qn [s, a] + α × (reward_win − Qn [s, a]) | | (17) | end = TRUE | | (18) | Check the success rate for the latest 100 episodes and assign to current_rate | | (19) | elseif (swap_times >upper_bound) then | | (20) | | | (21) | end = TRUE | | (22) | else | | (23) | if (dist(s′, S_goal) > dist(s, S_goal)) | | (24) | | | (25) | elseif (dist(s′, S_goal) < dist(s, S_goal)) | | (26) | | | (27) | else | | (28) | | | (29) | s ⟵ s′ | | (30) | until end is TRUE | | (31) | train_steps = train_steps + 1 | | (32) | until current_rate >= success_rate | | (33) | return Qn , train_steps |
|