Research Article

Research on Video Captioning Based on Multifeature Fusion

Table 1

Comparison of the experimental results of the model obtained by different experimental parameters and different modal information fusion training under the MSR-VTT dataset.

Number layerFeatureScore
BLEU4METEORROUGELCIDEr
CoordinatedJointCoordinatedJointCoordinatedJointCoordinatedJoint

10.3060.2990.2550.2510.5170.5180.3910.400
0.3590.3520.2140.2000.6030.5980.3970.395
0.4010.4100.2900.2870.6190.5860.4220.410

20.3340.3250.2350.2200.5200.4990.3940.396
0.3860.3810.2430.2440.6090.5870.4240.422
0.4430.4300.3270.3190.6120.6000.5210.517

30.3250.3190.2270.2310.5420.5390.3890.391
0.3790.3770.2460.2370.5970.5850.4630.459
0.3930.3900.2920.2930.5990.5710.4970.469