Research Article

Research on Fresh Product Logistics Transportation Scheduling Based on Deep Reinforcement Learning

Algorithm 1

Deep Q network based on pointer network
Input: State space S, action space A, discount rate γ, learning rate α, parameter update interval C
(1)Initialize experience pool D, the capacity is ;
(2)Randomly initialize the parameters of the Q network ;
(3)Randomly initialize the parameters of the target Q network ;
(4)repeat
(5)Initialize the initial state s;
(6)repeat
(7)In state S, choose action
(8)Perform action a, observe the environment, get an instant reward r and a new state ;
(9)Put s, a, r, into D;
(10)Sample ss,aa,rr,ss’ from D;
(11)
(12)Use as the loss function to train the Q network;
(13) ;
(14)Every C steps, ;
(15)Until s is the termination state;
(16)Until , converges;
output:Q network