Research Article

Visual-Text Reference Pretraining Model for Image Captioning

Figure 5

A visual display of the generated captions and the corresponding visual regions on Visual Genome.