Research Article

Event Driven Duty Cycling with Reinforcement Learning and Monte Carlo Technique for Wireless Network

Algorithm 2

Node evaluation based on RL.
Initialization;
while e is smaller than the number of total episodes do
while n is smaller than the maximum step do
Take action with ε-soft greedy:
  
While the nodes are in RC phase do
  Wake up the nodes decided from the RL process;
  Generate data packets or receive data packets;
end while
 Determine the subsequent state;
n = n+1;
Observe the delay and energy consumption;
Compute reward
  
Store transition ) and τ in sample space Q;
update π, V(s), ε;
e = e + 1;
end while