Research Article

RDMMFET: Representation of Dense Multimodality Fusion Encoder Based on Transformer

Table 2

Ablation studies on VQA v2.0 test-dev with iterations and layers of each encoder.

ModuleSettingAccuracy

Number of iterationsEpoch = 372.46
Epoch=472.59
Epoch = 572.46
Epoch = 672.22
Number of encoder layers72.37
72.35
72.47
72.44
72.25
72.37
72.59