Security and Communication Networks

Research Article

Toward Backdoor Attacks for Image Captioning Model in Deep Neural Networks

Table 1

The architecture for CNN model. The max pooling is 3 × 3 and stride 2.


Layer	Output shape	Input shape

Conv.1	112,112	7, 7, 64, stride 2
Conv.2_x	56, 56
Conv.3_x	28, 28
Con.4_x	14, 14
Conv.5_x	7, 7
FC	1, 1	1000