Research Article

Multiple Context Learning Networks for Visual Question Answering

Table 3

The results of fine-tuning BERT on VQA v2.0 validation set.

Modellr×AllY/NNumOther

N = 1, BERT0.00165.2182.6145.7457.13
N = 1, BERT0.0166.1284.0845.9757.80
N = 1, BERT0.166.6185.0946.1457.98
N = 2, BERT0.167.5285.1849.0958.97
N = 3, BERT0.167.8685.3549.7559.27
N = 4, BERT0.167.8085.7349.9458.94
N = 3, LSTM-66.8684.6248.8558.34