Research Article
Stabilizing Transmission Capacity in Millimeter Wave Links by Q-Learning-Based Scheme
Algorithm 3
The Q table training process for UE i.
| Run at the cloud computing facility | | Input: , , the updated reward table for UE i | | Output: the trained Q table for UE i | (1) | Initialize each entry of Q table to 0 | (2) | For each episode do | (3) | Randomly select an initial state | (4) | = 0 | (5) | For each do | (6) | Compute according to formula (7) | (7) | Update the corresponding entry of Q table | (8) | If then | (9) | | (10) | | (11) | End if | (12) | End for | (13) | Determine the exploration probability (e.g., 0.1) based on exploration-exploitation policy | (14) | Generate a random number from 0 to 1 | (15) | If then | (16) | If can transfer to the next state (e.g., ) then | (17) | and go to 4 | (18) | End if | (19) | Else | (20) | Randomly select an action from | (21) | If the selected action can transfer to the next state then | (22) | and go to 4 | (23) | End if | (24) | End if | (25) | End for |
|