Research Article
Multimodal Feature Learning for Video Captioning
Table 3
Performance comparison with other state-of-the-art models on MSVD dataset.
| | Models | B@1 | B@2 | B@3 | B@4 | CIDEr |
| | SCN [11] | - | - | - | 51.1 | 77.7 | | LSTM-TSA [12] | 82.8 | 72.0 | 62.8 | 52.8 | 74.0 | | hLSTMat [10] | 82.9 | 72.2 | 63.0 | 53.0 | 73.8 | | SeFLA | 84.8 | 70.8 | 60.0 | 50.0 | 94.3 |
|
|