Research Article

Spoken Language Identification Using Deep Learning

Table 1

Review of previous studies along with results.

YearModel basisFeaturesLanguagesAcc.RemarksRef.

2021PLDA logistic regressioni-vector x-vectorJavanese, Sundanese, Minang96%PLDA and logistic classifiers are used with x-vector and i-vector feature extraction.[26]
2021CNN ResNet50 RNNMFCCIba, Kab, Sun, Ind, Eus, Jav, Tam, Tel, Kan, Hin, Tha, Rus, Cnh, Eng, Por, Mar53%Submit three different systems named Lipsia, Anlirika, and NTR with different specifications.[27]
2021Self-attentive pooling decoderNot definedEn, Fr, Es, De, Ru, It92.50%Used self-attentive pooling layer for language identification task.[28]
2021CNN LSTMMFCCIba, Kab, Sun, Ind, Eus, Jav, Tam, Tel, Kan, Hin, Tha, Cnh, Eng, Por, Mar, Rus74%CNN-LSTM combination is used for predicting language.[29]
2020TDNN optimal transportMFCCRussian, Kazakh, Mandarin, Korean, Japanese, Cantonese, Vietnamese, Tibetan, Indonesian, UyghurNot definedUsed unsupervised technique joint distribution adaptation neural network model for spoken language identification.[17]
2020CRNN ResNet50 DenseNet121Log-MelThree different datasets with different languages89%Used different pretrained models with triplet entropy loss for improving the generalization.[16]
2020CNNLog-MelSlovene, Russian, Slovak, Belarusian, Macedonian, Ukrainian, Croatian, Bulgarian, Czech, Serbian, Polish97.35%With the CNN model, two neural models are made: baseline and robust models for spoken language identification.[18]
2020CapsNetLog-MelArabic, Bengali, Chinese Mandarin, English, Hindi, Turkish, Spanish, Japanese, Punjabi, Portuguese98.20%Capsule network with encoder and decoder works well on spoken language identification.[20]
2020CNN-LSTMLog-MelGujarati, Tamil, Telugu79.02%Used CNN-LSTM system that uses CTC loss function at output layer and this approach is used for spoken language identification.[19]
2020Context aware modelLog-MelPrs, Amh, Fas, Hat, Hau, Eng, Cmn, Fra, Rus, Hin, Ukr, Spa, Pus, Urd, Yue, Bos, Vie, Hrv, Tur, Kat, Por, Kor97%Context-aware model works well on pair language and gives good results and better accuracy.[25]
2019ConvNetsMFCCFr, It, En, Ru, Es, De95.40%2D ConvNets with attention and GRU approach gives good results and better accuracy.[14]
2019ResNet50MFCCFr, It, En, Ru, Es, De89%In this, used pretrained ResNet50 model and cyclic learning rate approach for language identification.[8]
2018SVM-HMM modelNot definedEs, Fr, En, De70%In this, HMMs approach was used to translate speech into the vector sequences following the deep neural network.[30]
2017Inceptionv3 CRNNMFCCEs, De, En, Fr96%The pipeline of inception-v3 based transfer learning and bi-LSTM was used to extract temporal and convolutional attributes.[24]
2010Gaussian mixture modelPerceptual linear predictionTel, Dut, Hi, En, Ben, Fr, Es, De, Ru, It88.80%The use of Gaussian mixture models with the RPLP approach, which are processed using PLP and MFCC features.[3]
2009CNN-TDNNMFCCFr, De, En91.20%Log-Mel images were used as features for language identification coupled with SGD based neural network.[2]