Research Article

ALBRL: Automatic Load-Balancing Architecture Based on Reinforcement Learning in Software-Defined Networking

Algorithm 1

ALBRL training algorithm.
Input:
Reward discount factor , target update rate , target network parameter update frequency , the number of mini-batch samples , the number of iterations .
Randomly initialize the actor-online-network parameter and the critic-online-network parameter ;
Initialize the actor target network and the critic target network with and ;
Initialize replay buffer ;
Initialize the data buffer of SumTree , set the priority of all leaf nodes to 0;
Initialize a random process for action exploration;
Initialize state with n the collected information from the SDN controller and acquire its feature vector ;
1) fort =1, do
2)  Select the action weight according to the state in the actor online network;
3)  Deploy on SDN controller;
4)  Recalculate paths and issue the flow table;
5)  Get the reward , the new state , and the terminate flag from the SDN controller;
6)  Store () in ;
7)  Calculate the sample priority value: ;
8)  Update all nodes of ;
8)  Use the SumTree model to extract samples from ;
10)  Calculate the importance-sampling weight: ;
11)  Calculate the target Q value:
12)
13)  Use loss function to update the critic online network parameter :
14)
15)  Use the policy gradient to update the actor online network parameter :
16)
17)  ifthen
18)   update the Critic-target-network and Actor-target-network parameters:
    
    
19)  end if
20) end for