Research Article

JGRCAN: A Visual Question Answering Co-Attention Network via Joint Grid-Region Features

Table 1

Convolution dimensional transformation process.

InputConv + MaxPoolRB1 × 32RB × 4RB × 3RB × 23AvgPool

Layer structure3 × 3, Stride: 22 × 2, Stride: 2
Shape3 × 448 × 44864 × 112 × 112256 × 112 × 112512 × 56 × 561024 × 28 × 282048 × 14 × 142048 × 7 × 7

RB is the abbreviation of residual block; RB × 3 means that three residual blocks are used.