Research Article

Spoken Language Identification Using Deep Learning

Table 2

Architecture of 2D ConvNet model.

LayersFilters, kernels, and stridePaddingOutputNo. of parameters

First block
Conv2D(32, 7, 7)Valid(None, 994, 34, 32)1600
BatchNorm(None, 994, 34, 32)128
MaxPool2D(3, 3) s = 2Same(None, 497, 17, 32)0
Second block
Conv2D(64, 5, 5)Same(None, 497, 17, 64)51264
BatchNorm(None, 497, 17, 64)256
MaxPool2D(3, 3) s = 2Same(None, 249, 9, 64)0
Third block
Conv2D(128, 3, 3)Same(None, 249, 9, 128)73856
BatchNorm(None, 249, 9, 128)512
MaxPool2D(3, 3) s = 2Same(None, 125, 5, 128)0
Fourth block
Conv2D(256, 3, 3)Same(None, 125, 5, 256)295168
BatchNorm(None, 125, 5, 256)1024
MaxPool2D(3, 3) s = 2Same(None, 63, 3, 256)0
Fifth block
Conv2D(512, 3, 3)Same(None, 63, 3, 512)1180160
BatchNorm(None, 63, 3, 512)2048
MaxPool2D(3, 3) s = 2Same(None, 32, 2, 512)0
Flatten layer(None, 32768)0
BatchNorm(None, 32768)131072
Dense layer256(None, 256)8388864
BatchNorm(None, 256)1024
Dropout0.5(None, 256)0
Dense layer3(None, 3)771