Research Article

Spoken Language Identification Using Deep Learning

Table 5

Dataset description.

DatasetSpoken language identification [30]Language identification dataset [31]Common voice Kaggle dataset [32]Mozilla common voice dataset [33]

Number of languages322164
Total samplesTrain = 73080 (420 mins) Test = 540 (90 mins)2200035478523842
TypeAudioTextAudioAudio and TSV
Length10 seconds7 to 10 sentences in each lineLess than 10 secondsLess than 10 seconds
ExtensionFLACCSVMp3Mp3 and TSV