Research Article

Visual-Text Reference Pretraining Model for Image Captioning

Table 5

Comparison of the results generated by VTR-PTM in different initialization methods on MS COCO.

ApproachB@1B@2B@3B@4MRCS

VTR-PTM from scratch80.363.250.937.728.556.9123.425.8

VTR-PTM from BERT81.566.452.738.629.558.3125.627.8

VTR-PTM from UNILM82.967.353.440.930.961.5130.228.5