Research Article
A Video Classification Method Based on Spatiotemporal Detail Attention and Feature Fusion
Table 3
The comparison between this algorithm and other methods on the kinetics 400.
| | Top1 | Top5 | GFLOPs |
| I3D [15] | 72.1 | 90.3 | 108 | Two-stream I3D [15] | 75.7 | 92.0 | 216 | S3D-G [26] | 77.2 | 93.0 | — | Nonlocal R50 [47] | 76.5 | 92.6 | — | Nonlocal R101 [47] | 77.7 | 93.3 | — | R()D Flow [25] | 67.5 | 87.2 | 152 | STC [48] | 68.7 | 88.5 | — | ARTNet [49] | 69.2 | 88.3 | 23.5 | S3D [26] | 69.4 | 89.1 | 66.4 | ECO [50] | 70.0 | 89.4 | 216 | R()D [25] | 73.9 | 90.9 | 152 | TSN [1] | 71.3 | 91.5 | 33 | TSM [2] | 75.1 | 91.8 | 65 | SlowFast , R101 [3] | 78.9 | 93.5 | 213 | SlowFast , R101+NL [3] | 79.8 | 93.9 | 234 | VCM-SDD , R101_NP | 77.4 | 93.1 | 46.8 | VCM-SDD , R101 | 78.5 | 93.5 | 46.8 | VCM-SDD , R101_NP | 79.3 | 93.9 | 46.8 | VCM-SDD , R101 | 80.1 | 94.4 | 46.8 |
|
|