Research Article
Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning
Algorithm 1
Perishables integrate age and quantity deep Q-network.
(1) | Initialize replay memory pool to capacity | (2) | Use random weights to initialize the action-value function | (3) | Initialize target action-value function with weights | (4) | For = 1 to number of do | (5) | Reset the environment and initialize state | (6) | fordo | (7) | With probability , select a random action , otherwise select (-greedy policy) | (8) | Execute action and observe reward and | (9) | Store transition () in the replay memory pool | (10) | Set | (11) | Sample a minibatch of transitions , from replay memory pool | (12) | Calculate the target -value by equation (9) | (13) | Update the parameters of network by equation (10) | (14) | Every C steps reset | (15) | end for | (16) | end for |
|