Research Article
JGRCAN: A Visual Question Answering Co-Attention Network via Joint Grid-Region Features
Table 1
Convolution dimensional transformation process.
| | Input | Conv + MaxPool | RB1 × 32 | RB × 4 | RB × 3 | RB × 23 | AvgPool |
| Layer structure | — | 3 × 3, Stride: 2 | | | | | 2 × 2, Stride: 2 | Shape | 3 × 448 × 448 | 64 × 112 × 112 | 256 × 112 × 112 | 512 × 56 × 56 | 1024 × 28 × 28 | 2048 × 14 × 14 | 2048 × 7 × 7 |
|
|
RB is the abbreviation of residual block; RB × 3 means that three residual blocks are used.
|