Research Article
HVAC Optimal Control with the Multistep-Actor Critic Algorithm in Large Action Spaces
Algorithm 1
DQN-based HVAC control algorithm.
(1) | Initialize replay memory D to capacity N | (2) | Initialize action-value function Q with random weights θ | (3) | Initialize target action-value function with random weights | (4) | For episode = 1 to M do | (5) | Reset building environment to initial state | (6) | Initialize sequence and preprocessed sequence | (7) | For t = 1 to T do | (8) | If t mod k = = 0 then | (9) | With probability select a random action at | (10) | Otherwise select | (11) | Execute action in emulator and observe reward and image xt+1 | (12) | Set and preprocess | (13) | Store transition () in D | (14) | Sample random minibatch of transitions () from D | (15) | Set yi = | (16) | Train with respect to the network parameters θ | (17) | Every C steps reset | (18) | End if | (19) | End for | (20) | End for |
|