Computational Intelligence and Neuroscience

Research Article

An Empirical Investigation of Transfer Effects for Reinforcement Learning

The Q-learning based algorithm for the sorting task. RL_Sort.

input: S_training, Q_n[S_n, A_n]
(1)	initialize
(2)	upper_bound = n + 1
(3)	train_steps = 0
(4)	success_rate = 0.75
(5)	S_goal = [1, 2, ..., n]
(6)	repeat
(7)	end = FALSE
(8)	swap_times = 0
(9)	s = S_training
(10)	current_rate = 0
(11)	repeat
(12)	Select an action a based on ε-greedy
(13)	Perform the action a and observe s′ and the corresponding reward
(14)	swap_times = swap_times + 1
(15)	if (s′ is S_goal) then
(16)	Q_n[s, a] ⟵ Q_n [s, a] + α × (reward_win − Q_n [s, a])
(17)	end = TRUE
(18)	Check the success rate for the latest 100 episodes and assign to current_rate
(19)	elseif (swap_times >upper_bound) then
(20)
(21)	end = TRUE
(22)	else
(23)	if (dist(s′, S_goal) > dist(s, S_goal))
(24)
(25)	elseif (dist(s′, S_goal) < dist(s, S_goal))
(26)
(27)	else
(28)
(29)	s ⟵ s′
(30)	until end is TRUE
(31)	train_steps = train_steps + 1
(32)	until current_rate >= success_rate
(33)	return Q_n , train_steps