HVAC Optimal Control with the Multistep-Actor Critic Algorithm in Large Action Spaces

<table class="algorithm-group"><tr><td><table class="algorithm" id="alg2"><tr><td>(1)</td><td>Build DQN, constructed by basic actions</td></tr><tr><td>(2)</td><td>Train DQN, value-network and transition model</td></tr><tr><td>(3)</td><td>For (every 30 minute)</td></tr><tr><td>(4)</td><td> Construct search tree based on transition model and basic actions</td></tr><tr><td>(5)</td><td> While action <i>a</i> is not <svg height="9.14241pt" id="M54" style="vertical-align:-3.1815pt" version="1.1" viewbox="-0.0498162 -5.96091 12.9706 9.14241" width="12.9706pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M483 97L471 123C436 91 401 65 392 65C388 65 384 74 390 106C414 239 444 378 457 429L455 433C444 433 429 436 416 439C392 444 368 448 344 448C281 448 204 415 152 376C71 315 23 205 23 103C23 21 57 -12 85 -12C114 -12 149 6 185 34C231 70 285 119 329 183H331L309 81C292 0 308 -12 326 -12C350 -12 421 24 483 97ZM374 387C370 363 356 291 345 261C315 193 181 50 139 50C124 50 110 71 110 118C110 224 153 331 218 379C238 394 271 402 301 402C329 402 359 394 374 387Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,5.993,3.132)"><path d="M701 633L695 650H264C230 650 223 653 209 675H184C176 622 160 547 140 473L173 471C198 528 219 557 232 573C254 600 272 612 384 612H566L24 18L33 0H597C617 34 651 142 661 179L629 194C598 133 574 92 545 69C510 41 440 40 342 40C274 40 209 42 163 46L701 633Z"></path></g></svg>(<svg height="9.14241pt" id="M55" style="vertical-align:-3.1815pt" version="1.1" viewbox="-0.0498162 -5.96091 12.9706 9.14241" width="12.9706pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M483 97L471 123C436 91 401 65 392 65C388 65 384 74 390 106C414 239 444 378 457 429L455 433C444 433 429 436 416 439C392 444 368 448 344 448C281 448 204 415 152 376C71 315 23 205 23 103C23 21 57 -12 85 -12C114 -12 149 6 185 34C231 70 285 119 329 183H331L309 81C292 0 308 -12 326 -12C350 -12 421 24 483 97ZM374 387C370 363 356 291 345 261C315 193 181 50 139 50C124 50 110 71 110 118C110 224 153 331 218 379C238 394 271 402 301 402C329 402 359 394 374 387Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,5.993,3.132)"><path d="M701 633L695 650H264C230 650 223 653 209 675H184C176 622 160 547 140 473L173 471C198 528 219 557 232 573C254 600 272 612 384 612H566L24 18L33 0H597C617 34 651 142 661 179L629 194C598 133 574 92 545 69C510 41 440 40 342 40C274 40 209 42 163 46L701 633Z"></path></g></svg> = [0, 0, 0, 0])</td></tr><tr><td>(6)</td><td>  Choose best action by DQN, and then put this choice to search tree</td></tr><tr><td>(7)</td><td>  Expand search tree based on this choice and transition model</td></tr><tr><td>(8)</td><td> Choose <i>k</i> states close to original state by KNN from state set</td></tr><tr><td>(9)</td><td> Based on the value-network, find the best state that yields the lowest energy consumption</td></tr><tr><td>(10)</td><td> Change the set point</td></tr><tr><td>(11)</td><td>End for</td></tr></table></td></tr></table>

<div>Multistep-Actor Critic algorithm.</div>

Mathematical Problems in Engineering

alg2

Algorithm 2

Algorithm 2: HVAC Optimal Control with the Multistep-Actor Critic Algorithm in Large Action Spaces