Research Article
Multimodal Feature Learning for Video Captioning
Table 4
Performance comparison with other state-of-the-art models on MSR-VTT dataset.
| | Models | BLEU@4 |
| | MP-LSTM (V) [1] | 34.8 | | MP-LSTM (C) [1] | 35.4 | | MP-LSTM (V + C) [1] | 35.8 | | SA (V) [2] | 35.6 | | SA (C) [2] | 36.1 | | SA (V + C) [2] | 36.6 | | hLSTMt [10] | 37.4 | | hLSTMat [10] | 38.3 | | SeFLA | 41.8 |
|
|