Research Article

Visual-Text Reference Pretraining Model for Image Captioning

Table 1

Comparisons with state-of-the-art single-model approaches on MSCOCO karpathy test split.

ApproachB@1B@2B@3B@4MRCS

ADAPTIVE [24]74.858.444.433.626.455.0104.219.7
UP-DOWN [45]80.264.149.136.327.756.9120.121.4
CAVP [46]80.164.750.038.628.358.9126.321.6
SGAE [26]80.665.050.139.028.458.9129.122.2
ORT [47]80.838.628.758.4128.322.6
AOANET [48]81.065.838.929.258.8129.822.4
U-VLP [7]39.529.3129.323.2
NG-SAN [29]80.865.450.839.929.359.2132.123.3
ASG [27]23.024.550.1204.242.1
VTR-PTM (ours)82.967.353.440.930.961.5130.228.5