ALBRL: Automatic Load-Balancing Architecture Based on Reinforcement Learning in Software-Defined Networking
Algorithm 1
ALBRL training algorithm.
Input:
Reward discount factor , target update rate , target network parameter update frequency , the number of mini-batch samples , the number of iterations .
Randomly initialize the actor-online-network parameter and the critic-online-network parameter ;
Initialize the actor target network and the critic target network with and ;
Initialize replay buffer ;
Initialize the data buffer of SumTree , set the priority of all leaf nodes to 0;
Initialize a random process for action exploration;
Initialize state with n the collected information from the SDN controller and acquire its feature vector ;
1) fort =1, do
2) Select the action weight according to the state in the actor online network;
3) Deploy on SDN controller;
4) Recalculate paths and issue the flow table;
5) Get the reward , the new state , and the terminate flag from the SDN controller;
6) Store () in ;
7) Calculate the sample priority value: ;
8) Update all nodes of ;
8) Use the SumTree model to extract samples from ;
10) Calculate the importance-sampling weight: ;
11) Calculate the target Q value:
12)
13) Use loss function to update the critic online network parameter :
14)
15) Use the policy gradient to update the actor online network parameter :
16)
17) ifthen
18) update the Critic-target-network and Actor-target-network parameters: