Research Article

Integrating Temporal and Spatial Attention for Video Action Recognition

Table 3

Comparisons with other methods on HMDB51 dataset.

ModelPretraining datasetAccuracy (%)GFLOPs

Res3D [22]Sports-1M54.9
T3D [24]Kinetics-40059.2
R(2 + 1)D [25]Sports-1M66.641.69
TSM [26]Kinetics-40073.632.88
I3D RGB [27]Imagenet + Kinetics-40074.8108
T-CNN [12]Kinetics-40073.315.78
T-CNN + spatialKinetics-40075.252.3