Research Article
PointTransformer: Encoding Human Local Features for Small Target Detection
Figure 2
The overall architecture of our proposed model. Attention backbone (A) utilize a trained pose estimation model to reconstruct local features based on transformer encoder. Gated position embedding module (G) uses human skeletal point location information to enhance local feature learning. Head-layer module (H) reconstructs the output layer by weighting the positional encoding feature maps.