Scientific Programming

Research Article

Research on Fresh Product Logistics Transportation Scheduling Based on Deep Reinforcement Learning

Deep Q network based on pointer network

	Input: State space S, action space A, discount rate γ, learning rate α, parameter update interval C
(1)	Initialize experience pool D, the capacity is ;
(2)	Randomly initialize the parameters of the Q network ;
(3)	Randomly initialize the parameters of the target Q network ;
(4)	repeat
(5)	Initialize the initial state s;
(6)	repeat
(7)	In state S, choose action
(8)	Perform action a, observe the environment, get an instant reward r and a new state ;
(9)	Put s, a, r, into D;
(10)	Sample ss,aa,rr,ss’ from D;
(11)	；
(12)	Use as the loss function to train the Q network;
(13)	;
(14)	Every C steps, ;
(15)	Until s is the termination state;
(16)	Until , converges;
	output：Q network