Learning to Drive in the NGSIM Simulator Using Proximal Policy Optimization

<table class="table-group" id="tab2"><tr><td><table class="table"><tr><td class="thead-hr" colspan="3"><hr/></td></tr><tr class="thead"><td class="align_left">Symbols</td><td class="align_center">Meaning</td><td class="align_center">Values</td></tr><tr><td class="thead-hr" colspan="3"><hr/></td></tr><tr><td class="align_left"><span style="width: 15.1458ptpx;"><svg height="11.8174pt" id="M68" style="vertical-align:-3.1815pt" version="1.1" viewbox="-0.0498162 -8.6359 15.1458 11.8174" width="15.1458pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M822 650H589L583 622C660 617 677 607 674 561C672 534 664 481 647 390L600 137H596L273 650H126L120 622C176 620 194 615 207 594C221 571 225 557 214 504L161 257C141 166 129 112 121 85C108 42 83 30 29 28L23 0H260L266 28C193 33 173 42 176 89C178 122 186 172 202 255L256 527H259L583 -8H612L690 390C708 481 720 535 728 558C744 603 756 619 816 622L822 650Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,10.062,3.132)"><path d="M389 0V32C297 38 291 46 291 118V635C234 613 175 595 109 583V556L161 554C203 552 207 547 207 497V118C207 46 201 38 110 32V0H389Z"></path></g></svg></span></td><td class="align_center">The total training episodes</td><td class="align_center">200</td></tr><tr><td class="align_left"><span style="width: 15.1458ptpx;"><svg height="11.8174pt" id="M69" style="vertical-align:-3.1815pt" version="1.1" viewbox="-0.0498162 -8.6359 15.1458 11.8174" width="15.1458pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M822 650H589L583 622C660 617 677 607 674 561C672 534 664 481 647 390L600 137H596L273 650H126L120 622C176 620 194 615 207 594C221 571 225 557 214 504L161 257C141 166 129 112 121 85C108 42 83 30 29 28L23 0H260L266 28C193 33 173 42 176 89C178 122 186 172 202 255L256 527H259L583 -8H612L690 390C708 481 720 535 728 558C744 603 756 619 816 622L822 650Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,10.062,3.132)"><path d="M414 144C384 79 371 75 317 75H135L276 221C367 316 408 376 408 465C408 570 327 635 237 635C179 635 131 609 100 575L42 494L67 471C94 510 138 565 205 565C277 565 321 517 321 435C321 348 258 270 195 195C146 137 88 81 33 26V0H411C423 44 433 88 446 135L414 144Z"></path></g></svg></span></td><td class="align_center">The total simulation steps (batch size)</td><td class="align_center">2048</td></tr><tr><td class="align_left"><span style="width: 7.6819ptpx;"><svg height="12.5794pt" id="M70" style="vertical-align:-3.29107pt" version="1.1" viewbox="-0.0498162 -9.28833 7.6819 12.5794" width="7.6819pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M238 681C243 705 239 712 230 712C217 712 156 682 75 674L70 648H105C148 648 153 641 144 598L39 110C18 11 35 -12 55 -12C90 -12 166 36 221 103L205 125C174 93 130 65 118 65C112 65 108 68 114 96L238 681Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,3.172,3.132)"><path d="M397 380C406 395 404 411 396 425S369 451 350 451C302 451 239 372 192 294H189L199 338C214 407 207 451 180 451C152 451 83 405 30 345L48 318C87 354 117 372 125 372S135 362 127 324L55 -5L61 -12C87 -5 117 3 139 5C154 87 168 162 179 207C198 250 240 310 260 332C281 355 297 366 307 366C321 366 333 360 347 346C351 342 360 342 369 348C379 355 389 366 397 380Z"></path></g></svg></span></td><td class="align_center">Learning rate</td><td class="align_center">0.0003</td></tr><tr><td class="align_left"><span style="width: 9.95144ptpx;"><svg height="8.68572pt" id="M71" style="vertical-align:-0.0498209pt" version="1.1" viewbox="-0.0498162 -8.6359 9.95144 8.68572" width="9.95144pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M743 650H503L496 622L527 618C563 613 564 603 532 573C449 495 371 431 323 392C301 374 272 355 246 346L280 522C297 609 300 614 379 622L385 650H135L129 622C209 614 215 609 198 522L124 133C106 39 99 35 23 28L17 0H271L277 28C193 35 192 39 208 133L239 316C264 328 280 325 303 288C368 183 435 90 502 0H652L659 28C602 34 584 43 543 94C495 154 403 283 347 369L574 554C634 603 659 612 735 624L743 650Z"></path></g></svg></span></td><td class="align_center">The number of repetitions of PPO training</td><td class="align_center">10</td></tr><tr><td class="align_left"><span style="width: 15.1458ptpx;"><svg height="11.927pt" id="M72" style="vertical-align:-3.291101pt" version="1.1" viewbox="-0.0498162 -8.6359 15.1458 11.927" width="15.1458pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M822 650H589L583 622C660 617 677 607 674 561C672 534 664 481 647 390L600 137H596L273 650H126L120 622C176 620 194 615 207 594C221 571 225 557 214 504L161 257C141 166 129 112 121 85C108 42 83 30 29 28L23 0H260L266 28C193 33 173 42 176 89C178 122 186 172 202 255L256 527H259L583 -8H612L690 390C708 481 720 535 728 558C744 603 756 619 816 622L822 650Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,10.062,3.132)"><path d="M290 377C321 398 342 415 358 430C378 450 389 473 389 502C389 578 329 635 238 635H237C184 635 137 610 109 578L64 515L88 493C112 529 154 573 208 573S303 542 303 482C303 409 233 370 141 341L149 308C165 313 190 319 215 319C272 319 341 283 341 193C342 98 292 43 222 43C163 43 122 72 96 94C88 101 79 100 70 94C61 87 47 73 46 60C44 47 48 37 62 23C76 10 118 -12 165 -12C238 -12 430 62 430 223C430 297 379 359 290 375V377Z"></path></g></svg></span></td><td class="align_center">Minibatch size</td><td class="align_center">256</td></tr><tr class="table-tr"><td colspan="3"><hr class="tbody-hr"/></td></tr></table></td></tr></table>

Journal of Advanced Transportation

tab2

Table 2

Table 2: Learning to Drive in the NGSIM Simulator Using Proximal Policy Optimization