Research Article
Spoken Language Identification Using Deep Learning
Table 8
Performance measures obtained from various language identification techniques.
| Models | Dataset used | Train acc. | Test acc. | Validation acc. | Sensitivity | Specificity | F1 score | Precision | Recall |
| CNN | Spoken language identification | 100 | 98.63 | 98.90 | 98.90 | 99 | 99.90 | 99.90 | 99.90 | Naïve Bayes | Language identification dataset | 94 | 94 | 93 | 100 | 99 | 93.60 | 94.90 | 93.60 | Word embedding | Language identification dataset | 95.50 | 92.20 | 93.20 | 98 | 99 | 92.50 | 93.50 | 92.50 | Logistic regression | Spoken language identification | 64.32 | 61.95 | 63.55 | 64.23 | 63.55 | 64.23 | 64.23 | 64.23 | Naïve Bayes | Spoken language identification | 50.25 | 49.75 | 51.20 | 52.32 | 51.78 | 51.20 | 51.20 | 51.20 | SVM | Common voice Kaggle | 82.88 | 83.32 | 82.65 | 75.95 | 75.55 | 76.95 | 76.95 | 76.95 | Random forest classifier | Common voice Kaggle | 72.42 | 71.90 | 71.50 | 66.32 | 67.24 | 67.23 | 67.23 | 67.23 | VGG16 | Mozilla common voice | 81.30 | 80.21 | 81.05 | 81.25 | 80.52 | 79.83 | 80.02 | 80.02 | ResNet50 | Spoken language identification | 86.30 | 80.20 | 84.32 | 85.36 | 84.23 | 84.65 | 82.75 | 82.75 | CapsNet [20] | Spoken language identification | 91.80 | 88.76 | 90.72 | 88 | 89 | 89 | 89 | 89 | 2D ConvNet bidirectional GRU [14] | Spoken language identification | 68.85 | 65.23 | 67.82 | 68 | 66 | 66 | 66 | 66 | Acoustic model [25] | Spoken language identification | 75.69 | 73.23 | 74.23 | 75 | 75 | 75 | 75 | 75 | CNN LSTM [19] | Spoken language identification | 83.25 | 80.52 | 82.45 | 83 | 82 | 82 | 82 | 82 | Logistic regression I-vector [26] | Mozilla common voice | 84.30 | 80.23 | 82.52 | 78.95 | 80.05 | 82.36 | 82.36 | 82.36 | LSTM-CNN [29] | Common voice Kaggle | 70.21 | 68.33 | 69.96 | 67.23 | 68.33 | 69.54 | 69.54 | 69.54 |
|
|