Research Article
HVAC Optimal Control with the Multistep-Actor Critic Algorithm in Large Action Spaces
Algorithm 2
Multistep-Actor Critic algorithm.
(1) | Build DQN, constructed by basic actions | (2) | Train DQN, value-network and transition model | (3) | For (every 30 minute) | (4) | Construct search tree based on transition model and basic actions | (5) | While action a is not ( = [0, 0, 0, 0]) | (6) | Choose best action by DQN, and then put this choice to search tree | (7) | Expand search tree based on this choice and transition model | (8) | Choose k states close to original state by KNN from state set | (9) | Based on the value-network, find the best state that yields the lowest energy consumption | (10) | Change the set point | (11) | End for |
|