Research Article

Semisupervised Deep Features of Time-Frequency Maps for Multimodal Emotion Recognition

Table 4

Structure of Inception-v3.

TypesPatch size/stride (or remarks)Input size

Convolution3 × 3/2299 × 299 × 3
Convolution3 × 3/1149 × 149 × 32
Convolution padded3 × 3/1147 × 147 × 32
Maximum pooling3 × 3/2147 × 147 × 64
Convolution3 × 3/173 × 73 × 64
Convolution3 × 3/271 × 71 × 80
Convolution3 × 3/135 × 35 × 192
3 × inceptionAs in Figure 3(a)35 × 35 × 288
5 × inceptionAs in Figure 3(b)17 × 17 × 768
2 × inceptionAs in Figure 3(c)8 × 8 × 1280
Maximum pooling8 × 88 × 8 × 2048
LinearLogits (unnormalized log-probabilities)8 × 8 × 2048
SoftmaxClassifier8 × 8 ×