Research Article

Visual-Text Reference Pretraining Model for Image Captioning

Table 4

Comparison of the results generated by the visual reference network of VTR-PTM in single-channel and dual-channel coding on MSCOCO.

ApproachB@1B@2B@3B@4MRCS

VRN-SC81.967.153.240.730.361.0129.728.2
VRN-DC82.967.353.440.930.961.5130.228.5