Improved U-Net-Like Network for Visual Saliency Detection Based on Pyramid Feature Attention

<div>Detailed structure of the context-aware pyramid feature extraction module. The context-aware feature extraction module takes the high-level features output by the encoder of U-Net-like backbone as input and is composed of three convolutional layers with <svg height="8.69875pt" id="M6" style="vertical-align:-0.3499298pt" version="1.1" viewbox="-0.0498162 -8.34882 26.097 8.69875" width="26.097pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M285 378C315 398 338 416 353 432C373 451 384 474 384 503C384 579 325 635 236 635H235C182 635 136 610 108 579L65 516L85 496C110 533 150 575 205 575C258 575 300 543 300 481C300 407 232 369 141 339L147 310C163 315 188 321 211 321C268 321 338 284 338 192C338 94 288 40 217 40C160 40 119 68 93 91C85 98 77 97 69 91C60 84 47 71 46 58C44 46 48 35 62 22C75 10 116 -12 162 -12C234 -12 424 62 424 224C424 297 373 359 285 376V378Z"></path></g><g transform="matrix(.013,0,0,-0.013,9.146,0)"><path d="M528 54L331 254L528 455L492 493L294 291L96 493L60 455L257 254L60 54L96 16L294 217L492 16L528 54Z"></path></g><g transform="matrix(.013,0,0,-0.013,19.682,0)"><path d="M285 378C315 398 338 416 353 432C373 451 384 474 384 503C384 579 325 635 236 635H235C182 635 136 610 108 579L65 516L85 496C110 533 150 575 205 575C258 575 300 543 300 481C300 407 232 369 141 339L147 310C163 315 188 321 211 321C268 321 338 284 338 192C338 94 288 40 217 40C160 40 119 68 93 91C85 98 77 97 69 91C60 84 47 71 46 58C44 46 48 35 62 22C75 10 116 -12 162 -12C234 -12 424 62 424 224C424 297 373 359 285 376V378Z"></path></g></svg> dilated filters with different dilated rates and one <svg height="8.69875pt" id="M7" style="vertical-align:-0.3499298pt" version="1.1" viewbox="-0.0498162 -8.34882 26.097 8.69875" width="26.097pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M384 0V27C293 34 287 42 287 114V635C232 613 172 594 109 583V559L157 557C201 555 205 550 205 499V114C205 42 199 34 109 27V0H384Z"></path></g><g transform="matrix(.013,0,0,-0.013,9.145,0)"><path d="M528 54L331 254L528 455L492 493L294 291L96 493L60 455L257 254L60 54L96 16L294 217L492 16L528 54Z"></path></g><g transform="matrix(.013,0,0,-0.013,19.682,0)"><path d="M384 0V27C293 34 287 42 287 114V635C232 613 172 594 109 583V559L157 557C201 555 205 550 205 499V114C205 42 199 34 109 27V0H384Z"></path></g></svg> convolutional layer. CFE in this figure stands for context-aware feature extraction module.</div>

Wireless Communications and Mobile Computing

fig2

Figure 2

Figure 2: Improved U-Net-Like Network for Visual Saliency Detection Based on Pyramid Feature Attention