Research Article

RDMMFET: Representation of Dense Multimodality Fusion Encoder Based on Transformer

Table 1

Statistics of data sets used for pretraining.

Image (K)Questions
MS COCO (K) [33]VG (M) [34]VQA v2.0 (K)GQA (M) [35]VG-QA (M) [36]All (M)

1806175.396581.071.449.18