Research Article
A Reinforcement Learning-Based Configuring Approach in Next-Generation Wireless Networks Using Software-Defined Metasurface
Algorithm 1
The process for finding the paths for multiple users in a PWE using the proposed RL algorithm.
| Input::number of agents, :number of states, e: acceptable error, | | Output: Final Q table | | | | : the set of agent which at the episode have not reached to the goal | | | | : return the next state corresponding to current state and performing the action | | :the set of states used by all of the agents | | : Variance of Q-table | | | | whiledo | | | | whiledo | | fordo | | generate a random number r in [0, 1] | | ifthen | | Select action of agent according to Q-table | | ifthen | | select another action | | end | | end | | else | | Select action of agent randomly | | ifthen | | select another action randomely | | end | | end | | Calculate by equation (2) | | Update by equation (3) | | Update | | | | end | | end | | Store | | Calculate | | end |
|