Research Article
Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning
Algorithm 2
Perishables integrate age and quantity advantage actor-critic.
| (1) | Use random weights to initialize the policy network and value network | | (2) | for = 1 to number of do | | (3) | Reset the environment and initialize state | | (4) | For = 1, do | | (5) | Take action based on action probability | | (6) | Execute action and observe reward and | | (7) | Update the parameters of the value network by minimizing the loss function equation (11) | | (8) | Estimate advantage function by equation (14) | | (9) | Update the policy network parameters , where is calculated by equation (13) | | (10) | Set | | (11) | end for | | (12) | end for |
|