Research Article
RDMMFET: Representation of Dense Multimodality Fusion Encoder Based on Transformer
Table 1
Statistics of data sets used for pretraining.
| Image (K) | Questions | MS COCO (K) [33] | VG (M) [34] | VQA v2.0 (K) | GQA (M) [35] | VG-QA (M) [36] | All (M) |
| 180 | 617 | 5.39 | 658 | 1.07 | 1.44 | 9.18 |
|
|