Research Article
Visual-Text Reference Pretraining Model for Image Captioning
Table 1
Comparisons with state-of-the-art single-model approaches on MSCOCO karpathy test split.
| Approach | B@1 | B@2 | B@3 | B@4 | M | R | C | S |
| ADAPTIVE [24] | 74.8 | 58.4 | 44.4 | 33.6 | 26.4 | 55.0 | 104.2 | 19.7 | UP-DOWN [45] | 80.2 | 64.1 | 49.1 | 36.3 | 27.7 | 56.9 | 120.1 | 21.4 | CAVP [46] | 80.1 | 64.7 | 50.0 | 38.6 | 28.3 | 58.9 | 126.3 | 21.6 | SGAE [26] | 80.6 | 65.0 | 50.1 | 39.0 | 28.4 | 58.9 | 129.1 | 22.2 | ORT [47] | 80.8 | — | — | 38.6 | 28.7 | 58.4 | 128.3 | 22.6 | AOANET [48] | 81.0 | 65.8 | — | 38.9 | 29.2 | 58.8 | 129.8 | 22.4 | U-VLP [7] | — | — | — | 39.5 | 29.3 | — | 129.3 | 23.2 | NG-SAN [29] | 80.8 | 65.4 | 50.8 | 39.9 | 29.3 | 59.2 | 132.1 | 23.3 | ASG [27] | — | — | — | 23.0 | 24.5 | 50.1 | 204.2 | 42.1 | VTR-PTM (ours) | 82.9 | 67.3 | 53.4 | 40.9 | 30.9 | 61.5 | 130.2 | 28.5 |
|
|