Research Article

A Reinforcement Learning-Based Configuring Approach in Next-Generation Wireless Networks Using Software-Defined Metasurface

Algorithm 1

The process for finding the paths for multiple users in a PWE using the proposed RL algorithm.
Input::number of agents, :number of states, e: acceptable error,
Output: Final Q table
: the set of agent which at the episode have not reached to the goal
: return the next state corresponding to current state and performing the action
:the set of states used by all of the agents
: Variance of Q-table
whiledo
whiledo
  fordo
   generate a random number r in [0, 1]
   ifthen
    Select action of agent according to Q-table
    ifthen
     select another action
    end
   end
   else
    Select action of agent randomly
    ifthen
     select another action randomely
    end
   end
   Calculate by equation (2)
   Update by equation (3)
   Update
   
  end
end
 Store
 Calculate
end