Security and Communication Networks

Research Article

A Reinforcement Learning-Based Configuring Approach in Next-Generation Wireless Networks Using Software-Defined Metasurface

The process for finding the paths for multiple users in a PWE using the proposed RL algorithm.

	Input::number of agents, :number of states, e: acceptable error,
	Output: Final Q table

	: the set of agent which at the episode have not reached to the goal

	: return the next state corresponding to current state and performing the action
	:the set of states used by all of the agents
	: Variance of Q-table

	whiledo

	whiledo
	fordo
	generate a random number r in [0, 1]
	ifthen
	Select action of agent according to Q-table
	ifthen
	select another action
	end
	end
	else
	Select action of agent randomly
	ifthen
	select another action randomely
	end
	end
	Calculate by equation (2)
	Update by equation (3)
	Update

	end
	end
	Store
	Calculate
	end