Mobile Information Systems

Research Article

Stabilizing Transmission Capacity in Millimeter Wave Links by Q-Learning-Based Scheme

The reward table update process for UE i.

	Run at the edge computing facility
	Input: the initialized reward table for UE i and the personalized information reported by all the UEs
	Output: the updated reward table for UE i
(1)	Find the SBS associated by UE i according to the personalized information reported by UE i
(2)	If there is not any SBS associated by UE i then
(3)	Determine the set of neighboring UEs according to the personalized information reported by UE i
(4)	For each neighboring UE i′ do
(5)	Determine its associating state and working state according to the personalized information reported by UE i′
(6)	If UE i′ is both associated with an SBS and idle then
(7)	Record it as a candidate relaying UE of UE i and store it in the set R_i
(8)	End if
(9)	End for
(10)	Extract each candidate from the set R_i that is in the same coverage area as the UE i and then store it in the set SR_i
(11)	If the set SR_i is not empty then
(12)	Select the candidate with the highest energy reserve level from the set SR_i, which is denoted as UE i′ and associated with SBS j
(13)	For each do
(14)	For each do
(15)	Determine , , and according to and
(16)
(17)	=
(18)	End for
(19)	End for
(20)	Else if the set R_i is not empty then
(21)	Select the candidate with the highest energy reserve level from the set R_i, which is denoted as UE i′ and associated with SBS j′
(22)	For each do
(23)	For each do
(24)	Determine , and according to and
(25)
(26)	=
(27)	End for
(28)	End for
(29)	End if
(30)	End if