Research Article

RDMMFET: Representation of Dense Multimodality Fusion Encoder Based on Transformer

Table 3

Comparison with the latest models on the VQA v2.0 data set.

LabelMethodTest-devTest-std

No pretrainingDFAF [8]70.2270.34
MCAN [9]70.6370.90
MUAN [38]70.8271.10
PretrainingViLBERT [23]70.5570.92
VisualBert [27]70.8071.00
VL-BERT(base) [28]71.16-
VL-BERT(large) [28]71.7972.22
LXMERT [24]72.4272.54
RDMMFET (ours)72.5972.67