Complexity

Research Article

Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning

Perishables integrate age and quantity deep Q-network.

(1)	Initialize replay memory pool to capacity
(2)	Use random weights to initialize the action-value function
(3)	Initialize target action-value function with weights
(4)	For = 1 to number of do
(5)	Reset the environment and initialize state
(6)	fordo
(7)	With probability , select a random action , otherwise select (-greedy policy)
(8)	Execute action and observe reward and
(9)	Store transition () in the replay memory pool
(10)	Set
(11)	Sample a minibatch of transitions , from replay memory pool
(12)	Calculate the target -value by equation (9)
(13)	Update the parameters of network by equation (10)
(14)	Every C steps reset
(15)	end for
(16)	end for