Research Article

Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning

Algorithm 1

Perishables integrate age and quantity deep Q-network.
(1)Initialize replay memory pool to capacity
(2)Use random weights to initialize the action-value function
(3)Initialize target action-value function with weights
(4)For  = 1 to number of do
(5)Reset the environment and initialize state
(6)fordo
(7)With probability , select a random action , otherwise select (-greedy policy)
(8)Execute action and observe reward and
(9)Store transition () in the replay memory pool
(10)Set
(11)Sample a minibatch of transitions , from replay memory pool
(12)Calculate the target -value by equation (9)
(13)Update the parameters of network by equation (10)
(14)Every C steps reset
(15)end for
(16)end for