Research Article

JGRCAN: A Visual Question Answering Co-Attention Network via Joint Grid-Region Features

Table 2

Performance comparisons with different feature combinations.

Accuracy
Features (%)AllY/NNumberOther

Grid70.5686.2752.2161.85
Region70.6386.8253.0660.72
Grid + region70.8786.9752.8861.45

Y/N, number, and other are three different types of questions. Y/N includes yes or no questions and only answers to yes or no, for example: “Is it... ?”, “Does it... ?” Number questions include counting questions, answering only numbers, e.g., “how many...?” Other includes asking about color, type, etc. For example, “what color is....” and “what food is...”