Research Article

Toward Backdoor Attacks for Image Captioning Model in Deep Neural Networks

Table 1

The architecture for CNN model. The max pooling is 3 × 3 and stride 2.

LayerOutput shapeInput shape

Conv.1112,1127, 7, 64, stride 2
Conv.2_x56, 56
Conv.3_x28, 28
Con.4_x14, 14
Conv.5_x7, 7
FC1, 11000