Intercept Guidance of Maneuvering Targets with Deep Reinforcement Learning

<table class="table-group" id="tab2"><tr><td><table class="table"><tr><td class="thead-hr" colspan="2"><hr/></td></tr><tr class="thead"><td class="align_left">Parameter</td><td class="align_center">Value</td></tr><tr><td class="thead-hr" colspan="2"><hr/></td></tr><tr><td class="align_left">Number of hidden layers</td><td class="align_center">2</td></tr><tr><td class="align_left">BATCH_SIZE</td><td class="align_center">32</td></tr><tr><td class="align_left">Replay buffer size</td><td class="align_center">50000</td></tr><tr><td class="align_left">Actor learning rate</td><td class="align_center">10<sup>-5</sup></td></tr><tr><td class="align_left">Critic learning rate</td><td class="align_center"><span style="width: 42.8878ptpx;"><svg height="11.786pt" id="M197" style="vertical-align:-0.3499002pt" version="1.1" viewbox="-0.0498162 -11.4361 42.8878 11.786" width="42.8878pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M412 140C382 77 369 73 315 73H129L270 222C362 320 402 379 402 466C402 571 322 635 234 635C177 635 130 609 99 576L42 495L64 475C90 514 133 568 201 568C274 568 318 519 318 435C318 349 255 267 193 193C144 135 87 78 32 23V0H405C417 45 427 89 440 131L412 140Z"></path></g><g transform="matrix(.013,0,0,-0.013,9.145,0)"><path d="M528 54L331 254L528 455L492 493L294 291L96 493L60 455L257 254L60 54L96 16L294 217L492 16L528 54Z"></path></g><g transform="matrix(.013,0,0,-0.013,19.682,0)"><path d="M384 0V27C293 34 287 42 287 114V635C232 613 172 594 109 583V559L157 557C201 555 205 550 205 499V114C205 42 199 34 109 27V0H384Z"></path></g><g transform="matrix(.013,0,0,-0.013,25.922,0)"><path d="M241 635C89 635 35 457 35 312C35 153 89 -12 240 -12C390 -12 443 166 443 312C443 466 390 635 241 635ZM238 602C329 602 354 454 354 312C354 172 330 22 240 22C152 22 124 173 124 313S148 602 238 602Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,32.209,-5.741)"><path d="M556 236V289H56V236H556Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,37.769,-5.741)"><path d="M158 548H390L417 615L410 623H122L83 318C105 326 143 337 185 337C296 337 350 275 350 188C350 116 308 42 225 42C164 42 122 74 100 93C90 101 82 99 72 92C60 82 51 68 50 59C48 46 52 38 66 24C82 9 125 -12 172 -12C225 -11 292 15 346 59C408 108 437 166 437 226C437 309 371 397 242 397C214 397 170 382 133 369L158 548Z"></path></g></svg></span></td></tr><tr><td class="align_left">Policy noise</td><td class="align_center">0.2</td></tr><tr><td class="align_left">Noise bound</td><td class="align_center">0.5</td></tr><tr><td class="align_left">Soft update factor <svg height="6.1673pt" id="M198" style="vertical-align:-0.2063904pt" version="1.1" viewbox="-0.0498162 -5.96091 6.40217 6.1673" width="6.40217pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M471 456L444 459C426 433 414 430 388 430C324 430 270 434 216 434C103 434 51 374 23 338L43 317C96 366 146 380 221 375L154 109C149 86 147 68 147 52C147 4 168 -12 197 -12C240 -12 291 25 334 71L320 96C295 75 268 58 252 58C238 58 227 79 238 138C251 211 272 296 292 372C310 372 332 368 350 368C391 368 421 369 434 371C444 388 455 413 471 456Z"></path></g></svg></td><td class="align_center">0.01</td></tr><tr><td class="align_left">Discounting factor <i>γ</i></td><td class="align_center">0.95</td></tr><tr><td class="align_left">Delay steps</td><td class="align_center">5</td></tr><tr><td class="align_left">Gradient optimizer</td><td class="align_center">Adam</td></tr><tr class="table-tr"><td colspan="2"><hr class="tbody-hr"/></td></tr></table></td></tr></table>

<div>Hyperparameters for TD3 algorithm.</div>

International Journal of Aerospace Engineering

tab2

Table 2

Table 2: Intercept Guidance of Maneuvering Targets with Deep Reinforcement Learning