Research Article

Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

Table 6

Ablation experiments on reduced downsampling with the use of lightweight upsampling blocks, on the MSCOCO val dataset. V1 denote the model that uses C5 as the input for upsampling, using the first 16 layers of MobileNetV3 as the downsampling and three layers of deconvolution as the upsampling part. V2 denote the model that uses C4 as the input for upsampling, using the first 13 layers of MobileNetV3 as the downsampling part, then uses three layers of bottleneck with a stride of 1, and finally uses two layers of the same deconvolution as V1 as the upsampling part.

ModelInput sizeFLOPs#Params

V1604M2.5M65.2287.0573.0562.271.4771.45
V2684M2.1M66.2387.2374.1963.2172.4872.38
V11.33G2.5M68.4487.7175.4164.8975.0174.47
V21.5G2.1M68.9787.7275.4765.3275.774.85
Ours557M1.5M66.2387.3874.2563.1372.5272.4
Ours1.23G1.5M69.0387.7275.9565.5275.5574.98