Research Article
Toward Backdoor Attacks for Image Captioning Model in Deep Neural Networks
Table 1
The architecture for CNN model. The max pooling is 3 × 3 and stride 2.
| Layer | Output shape | Input shape |
| Conv.1 | 112,112 | 7, 7, 64, stride 2 | Conv.2_x | 56, 56 | | Conv.3_x | 28, 28 | | Con.4_x | 14, 14 | | Conv.5_x | 7, 7 | | FC | 1, 1 | 1000 |
|
|