Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

<div>Ablation experiments on reduced downsampling with the use of lightweight upsampling blocks, on the MSCOCO val dataset. V1 denote the model that uses C5 as the input for upsampling, using the first 16 layers of MobileNetV3 as the downsampling and three layers of deconvolution as the upsampling part. V2 denote the model that uses C4 as the input for upsampling, using the first 13 layers of MobileNetV3 as the downsampling part, then uses three layers of <svg height="8.46388pt" id="M198" style="vertical-align:-0.3499298pt" version="1.1" viewbox="-0.0498162 -8.11395 26.097 8.46388" width="26.097pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M153 550H386L412 615L406 623H120L82 318C104 327 142 338 184 338C294 338 347 275 347 187C347 112 305 39 221 39C160 39 119 71 97 89C88 97 80 96 71 90C59 80 50 67 49 57C48 45 52 36 66 23C80 9 123 -12 169 -12C221 -11 288 15 342 59C403 109 431 165 431 225C431 308 366 395 238 395C212 395 165 379 127 364L153 550Z"></path></g><g transform="matrix(.013,0,0,-0.013,9.146,0)"><path d="M528 54L331 254L528 455L492 493L294 291L96 493L60 455L257 254L60 54L96 16L294 217L492 16L528 54Z"></path></g><g transform="matrix(.013,0,0,-0.013,19.682,0)"><path d="M153 550H386L412 615L406 623H120L82 318C104 327 142 338 184 338C294 338 347 275 347 187C347 112 305 39 221 39C160 39 119 71 97 89C88 97 80 96 71 90C59 80 50 67 49 57C48 45 52 36 66 23C80 9 123 -12 169 -12C221 -11 288 15 342 59C403 109 431 165 431 225C431 308 366 395 238 395C212 395 165 379 127 364L153 550Z"></path></g></svg> bottleneck with a stride of 1, and finally uses two layers of the same deconvolution as V1 as the upsampling part.</div>

Wireless Communications and Mobile Computing

tab6

Table 6

Table 6: Designing Compact Convolutional Filters for Lightweight Human Pose Estimation