| Year | Model basis | Features | Languages | Acc. | Remarks | Ref. |
| 2021 | PLDA logistic regression | i-vector x-vector | Javanese, Sundanese, Minang | 96% | PLDA and logistic classifiers are used with x-vector and i-vector feature extraction. | [26] | 2021 | CNN ResNet50 RNN | MFCC | Iba, Kab, Sun, Ind, Eus, Jav, Tam, Tel, Kan, Hin, Tha, Rus, Cnh, Eng, Por, Mar | 53% | Submit three different systems named Lipsia, Anlirika, and NTR with different specifications. | [27] | 2021 | Self-attentive pooling decoder | Not defined | En, Fr, Es, De, Ru, It | 92.50% | Used self-attentive pooling layer for language identification task. | [28] | 2021 | CNN LSTM | MFCC | Iba, Kab, Sun, Ind, Eus, Jav, Tam, Tel, Kan, Hin, Tha, Cnh, Eng, Por, Mar, Rus | 74% | CNN-LSTM combination is used for predicting language. | [29] | 2020 | TDNN optimal transport | MFCC | Russian, Kazakh, Mandarin, Korean, Japanese, Cantonese, Vietnamese, Tibetan, Indonesian, Uyghur | Not defined | Used unsupervised technique joint distribution adaptation neural network model for spoken language identification. | [17] | 2020 | CRNN ResNet50 DenseNet121 | Log-Mel | Three different datasets with different languages | 89% | Used different pretrained models with triplet entropy loss for improving the generalization. | [16] | 2020 | CNN | Log-Mel | Slovene, Russian, Slovak, Belarusian, Macedonian, Ukrainian, Croatian, Bulgarian, Czech, Serbian, Polish | 97.35% | With the CNN model, two neural models are made: baseline and robust models for spoken language identification. | [18] | 2020 | CapsNet | Log-Mel | Arabic, Bengali, Chinese Mandarin, English, Hindi, Turkish, Spanish, Japanese, Punjabi, Portuguese | 98.20% | Capsule network with encoder and decoder works well on spoken language identification. | [20] | 2020 | CNN-LSTM | Log-Mel | Gujarati, Tamil, Telugu | 79.02% | Used CNN-LSTM system that uses CTC loss function at output layer and this approach is used for spoken language identification. | [19] | 2020 | Context aware model | Log-Mel | Prs, Amh, Fas, Hat, Hau, Eng, Cmn, Fra, Rus, Hin, Ukr, Spa, Pus, Urd, Yue, Bos, Vie, Hrv, Tur, Kat, Por, Kor | 97% | Context-aware model works well on pair language and gives good results and better accuracy. | [25] | 2019 | ConvNets | MFCC | Fr, It, En, Ru, Es, De | 95.40% | 2D ConvNets with attention and GRU approach gives good results and better accuracy. | [14] | 2019 | ResNet50 | MFCC | Fr, It, En, Ru, Es, De | 89% | In this, used pretrained ResNet50 model and cyclic learning rate approach for language identification. | [8] | 2018 | SVM-HMM model | Not defined | Es, Fr, En, De | 70% | In this, HMMs approach was used to translate speech into the vector sequences following the deep neural network. | [30] | 2017 | Inceptionv3 CRNN | MFCC | Es, De, En, Fr | 96% | The pipeline of inception-v3 based transfer learning and bi-LSTM was used to extract temporal and convolutional attributes. | [24] | 2010 | Gaussian mixture model | Perceptual linear prediction | Tel, Dut, Hi, En, Ben, Fr, Es, De, Ru, It | 88.80% | The use of Gaussian mixture models with the RPLP approach, which are processed using PLP and MFCC features. | [3] | 2009 | CNN-TDNN | MFCC | Fr, De, En | 91.20% | Log-Mel images were used as features for language identification coupled with SGD based neural network. | [2] |
|
|