Automatic Traffic State Recognition Based on Video Features Extracted by an Autoencoder

<table class="table-group" id="tab6"><tr><td><table class="table"><tr><td class="thead-hr" colspan="5"><hr/></td></tr><tr class="thead"><td class="align_left">Model</td><td class="align_center">Model structure</td><td class="align_center">Accuracy rate (%)</td><td class="align_center">Recall rate (%)</td><td class="align_center"><svg height="11.8174pt" id="M264" style="vertical-align:-3.1815pt" version="1.1" viewbox="-0.0498162 -8.6359 11.8575 11.8174" width="11.8575pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M584 650H137L131 622C214 614 217 612 200 521L125 127C109 41 101 35 23 28L17 0H288L294 28C201 35 193 42 209 128L242 309H348C440 309 442 300 443 226H471L510 422H482C452 354 449 348 357 348H251L295 575C302 609 304 615 338 615H426C502 615 517 604 526 581C534 560 536 524 537 492L565 494C574 554 583 631 584 650Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,6.786,3.132)"><path d="M389 0V32C297 38 291 46 291 118V635C234 613 175 595 109 583V556L161 554C203 552 207 547 207 497V118C207 46 201 38 110 32V0H389Z"></path></g></svg> (%)</td></tr><tr><td class="thead-hr" colspan="5"><hr/></td></tr><tr><td class="align_left">AlexNet</td><td class="align_center">5 convolutional layers, 3 pooling layers, 3 fully connected layers, and 1 classification layer</td><td class="align_center">94.5</td><td class="align_center">93.6</td><td class="align_center">94.0</td></tr><tr><td class="align_left">LeNet</td><td class="align_center">3 convolutional layers, 2 pooling layers, 1 fully connected layer, and 1 classification layer</td><td class="align_center">82.3</td><td class="align_center">62.4</td><td class="align_center">71.0</td></tr><tr><td class="align_left">GoogLeNet</td><td class="align_center">22 network layers, including convolutional layers and pooling layers, and 1 classification layer</td><td class="align_center">36.8</td><td class="align_center">35.2</td><td class="align_center">36.0</td></tr><tr><td class="align_left">VGG16</td><td class="align_center">13 convolutional layers, 3 fully connected layers, 5 pooling layers, and 1 classification layer</td><td class="align_center">11.1</td><td class="align_center">33.3</td><td class="align_center">16.7</td></tr><tr><td class="align_left"><svg height="10.1524pt" id="M265" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 15.352 10.1524" width="15.352pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M686 28C612 35 607 44 591 112C563 234 541 360 519 489L489 666L457 658L147 121C100 40 89 36 24 28L17 0H240L250 28C168 34 159 41 190 101L262 237H482C495 180 503 137 510 91C517 47 514 35 441 28L433 0H677L686 28ZM475 280H285L429 541H431L475 280Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,9.135,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z"></path></g></svg> Classifier</td><td class="align_center">5 encoding hidden layers + common classifiers (linear classification, SVM, DNN, and so on)</td><td class="align_center">94.5–97.1</td><td class="align_center">94.5–97.1</td><td class="align_center">94.4–97.1</td></tr><tr><td class="align_left"><span class="nowrap"><svg height="10.1524pt" id="M266" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 15.352 10.1524" width="15.352pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M686 28C612 35 607 44 591 112C563 234 541 360 519 489L489 666L457 658L147 121C100 40 89 36 24 28L17 0H240L250 28C168 34 159 41 190 101L262 237H482C495 180 503 137 510 91C517 47 514 35 441 28L433 0H677L686 28ZM475 280H285L429 541H431L475 280Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,9.135,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z"></path></g></svg>-</span>k-means</td><td class="align_center">5 encoding hidden layers + k-means clustering</td><td class="align_center">95.4</td><td class="align_center">95.3</td><td class="align_center">95.3</td></tr><tr class="table-tr"><td colspan="5"><hr class="tbody-hr"/></td></tr></table></td></tr></table>

<div>Test results of the CNN models and the models proposed in this paper.</div>

Mathematical Problems in Engineering

tab6

Table 6

Table 6: Automatic Traffic State Recognition Based on Video Features Extracted by an Autoencoder