Research Article
Integrating Temporal and Spatial Attention for Video Action Recognition
Table 3
Comparisons with other methods on HMDB51 dataset.
| Model | Pretraining dataset | Accuracy (%) | GFLOPs |
| Res3D [22] | Sports-1M | 54.9 | — | T3D [24] | Kinetics-400 | 59.2 | — | R(2 + 1)D [25] | Sports-1M | 66.6 | 41.69 | TSM [26] | Kinetics-400 | 73.6 | 32.88 | I3D RGB [27] | Imagenet + Kinetics-400 | 74.8 | 108 | T-CNN [12] | Kinetics-400 | 73.3 | 15.78 | T-CNN + spatial | Kinetics-400 | 75.2 | 52.3 |
|
|