Research Article

Multi-Scale Guided Attention Network for Crowd Counting

Figure 3

The architecture of upsampled embedded block (UEB). The bilinear interpolation method is used for upsampling to restore the high-level features to the same size as the low-level features. convolution is responsible for learning weights, extracting the high-level semantics for better-integrated information, and the fusion features could be further fused from top to bottom. The specific fusion features of Conv3_3, Conv4_3, and Conv5_3 from the VGG16 backbone network are used.