An Image-Based Deep Learning Approach with Improved DETR for Power Line Insulator Defect Detection

<div>The model structure of Transformer [<a href="/journals/js/2022/6703864/#B15">15</a>]. The three arrows generated from one are part of the self-attention mechanism, which corresponds to <span class="nowrap"><svg height="10.7866pt" id="M1" style="vertical-align:-2.150701pt" version="1.1" viewbox="-0.0498162 -8.6359 9.52083 10.7866" width="9.52083pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M699 368C699 549 574 666 407 666C186 666 23 488 23 277C23 113 129 -3 288 -13L307 -26C431 -111 501 -139 533 -147C559 -154 613 -163 658 -164L666 -141C597 -111 507 -66 430 -11L416 -1C580 42 699 190 699 368ZM601 371C601 227 518 54 381 22L354 40L278 24C175 47 120 145 120 269C120 451 235 631 398 631C540 631 601 521 601 371Z"></path></g></svg>,</span> <span class="nowrap"><svg height="8.68572pt" id="M2" style="vertical-align:-0.0498209pt" version="1.1" viewbox="-0.0498162 -8.6359 9.95144 8.68572" width="9.95144pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M743 650H503L496 622L527 618C563 613 564 603 532 573C449 495 371 431 323 392C301 374 272 355 246 346L280 522C297 609 300 614 379 622L385 650H135L129 622C209 614 215 609 198 522L124 133C106 39 99 35 23 28L17 0H271L277 28C193 35 192 39 208 133L239 316C264 328 280 325 303 288C368 183 435 90 502 0H652L659 28C602 34 584 43 543 94C495 154 403 283 347 369L574 554C634 603 659 612 735 624L743 650Z"></path></g></svg>,</span> and <svg height="8.8423pt" id="M3" style="vertical-align:-0.2064009pt" version="1.1" viewbox="-0.0498162 -8.6359 9.35121 8.8423" width="9.35121pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M697 650H468L461 623L492 619C539 613 547 605 518 546C481 471 367 264 278 116H276C239 278 197 500 186 567C180 604 185 613 226 619L252 623L260 650H24L17 623C78 617 92 613 108 533L216 -11H247C365 200 515 462 560 529C616 612 624 615 689 623L697 650Z"></path></g></svg> in Equation (<a href="https://static-preview.hindawi.com/articles/js/volume-2022/6703864/figures/#EEq1">1</a>) generated from the same input by three different linear projections. That is why it is called “Self-Attention.” The two arrows making from one are the <svg height="8.68572pt" id="M4" style="vertical-align:-0.0498209pt" version="1.1" viewbox="-0.0498162 -8.6359 9.95144 8.68572" width="9.95144pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M743 650H503L496 622L527 618C563 613 564 603 532 573C449 495 371 431 323 392C301 374 272 355 246 346L280 522C297 609 300 614 379 622L385 650H135L129 622C209 614 215 609 198 522L124 133C106 39 99 35 23 28L17 0H271L277 28C193 35 192 39 208 133L239 316C264 328 280 325 303 288C368 183 435 90 502 0H652L659 28C602 34 584 43 543 94C495 154 403 283 347 369L574 554C634 603 659 612 735 624L743 650Z"></path></g></svg> and <svg height="8.8423pt" id="M5" style="vertical-align:-0.2064009pt" version="1.1" viewbox="-0.0498162 -8.6359 9.35121 8.8423" width="9.35121pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M697 650H468L461 623L492 619C539 613 547 605 518 546C481 471 367 264 278 116H276C239 278 197 500 186 567C180 604 185 613 226 619L252 623L260 650H24L17 623C78 617 92 613 108 533L216 -11H247C365 200 515 462 560 529C616 612 624 615 689 623L697 650Z"></path></g></svg> generated from the encoder and along with the <svg height="10.7866pt" id="M6" style="vertical-align:-2.150701pt" version="1.1" viewbox="-0.0498162 -8.6359 9.52083 10.7866" width="9.52083pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M699 368C699 549 574 666 407 666C186 666 23 488 23 277C23 113 129 -3 288 -13L307 -26C431 -111 501 -139 533 -147C559 -154 613 -163 658 -164L666 -141C597 -111 507 -66 430 -11L416 -1C580 42 699 190 699 368ZM601 371C601 227 518 54 381 22L354 40L278 24C175 47 120 145 120 269C120 451 235 631 398 631C540 631 601 521 601 371Z"></path></g></svg> from the previous layers of decoder as inputs to the multi-head attention which is also called “Cross-Attention.”</div>

Journal of Sensors

fig1

Figure 1

Figure 1: An Image-Based Deep Learning Approach with Improved DETR for Power Line Insulator Defect Detection