Research Article

Visual-Text Reference Pretraining Model for Image Captioning

Figure 1

The left side is the pretraining process of VTR-PTM, and the right side is to fine tune the trained model for image captioning.