Research Article

Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning

Algorithm 2

Perishables integrate age and quantity advantage actor-critic.
(1)Use random weights to initialize the policy network and value network
(2)for  = 1 to number of do
(3)Reset the environment and initialize state
(4)For  = 1, do
(5)Take action based on action probability
(6)Execute action and observe reward and
(7)Update the parameters of the value network by minimizing the loss function equation (11)
(8)Estimate advantage function by equation (14)
(9)Update the policy network parameters , where is calculated by equation (13)
(10)Set
(11)end for
(12)end for