Research Article
Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning
Algorithm 2
Perishables integrate age and quantity advantage actor-critic.
(1) | Use random weights to initialize the policy network and value network | (2) | for = 1 to number of do | (3) | Reset the environment and initialize state | (4) | For = 1, do | (5) | Take action based on action probability | (6) | Execute action and observe reward and | (7) | Update the parameters of the value network by minimizing the loss function equation (11) | (8) | Estimate advantage function by equation (14) | (9) | Update the policy network parameters , where is calculated by equation (13) | (10) | Set | (11) | end for | (12) | end for |
|