Research Article

HVAC Optimal Control with the Multistep-Actor Critic Algorithm in Large Action Spaces

Algorithm 1

DQN-based HVAC control algorithm.
(1)Initialize replay memory D to capacity N
(2)Initialize action-value function Q with random weights θ
(3)Initialize target action-value function with random weights
(4)For episode = 1 to M do
(5) Reset building environment to initial state
(6) Initialize sequence and preprocessed sequence
(7) For t = 1 to T do
(8)  If t mod k = = 0 then
(9)   With probability select a random action at
(10)   Otherwise select
(11)   Execute action in emulator and observe reward and image xt+1
(12)   Set and preprocess
(13)   Store transition () in D
(14)   Sample random minibatch of transitions () from D
(15)   Set yi = 
(16)   Train with respect to the network parameters θ
(17)   Every C steps reset
(18)  End if
(19) End for
(20)End for