Research Article
Research on Fresh Product Logistics Transportation Scheduling Based on Deep Reinforcement Learning
Algorithm 1
Deep Q network based on pointer network
| Input: State space S, action space A, discount rate γ, learning rate α, parameter update interval C | (1) | Initialize experience pool D, the capacity is ; | (2) | Randomly initialize the parameters of the Q network ; | (3) | Randomly initialize the parameters of the target Q network ; | (4) | repeat | (5) | Initialize the initial state s; | (6) | repeat | (7) | In state S, choose action | (8) | Perform action a, observe the environment, get an instant reward r and a new state ; | (9) | Put s, a, r, into D; | (10) | Sample ss,aa,rr,ss’ from D; | (11) | ; | (12) | Use as the loss function to train the Q network; | (13) | ; | (14) | Every C steps, ; | (15) | Until s is the termination state; | (16) | Until , converges; | | output:Q network |
|