Computational Intelligence and Neuroscience

Research Article

Spoken Language Identification Using Deep Learning

Performance measures obtained from various language identification techniques.


Models	Dataset used	Train acc.	Test acc.	Validation acc.	Sensitivity	Specificity	F1 score	Precision	Recall

CNN	Spoken language identification	100	98.63	98.90	98.90	99	99.90	99.90	99.90
Naïve Bayes	Language identification dataset	94	94	93	100	99	93.60	94.90	93.60
Word embedding	Language identification dataset	95.50	92.20	93.20	98	99	92.50	93.50	92.50
Logistic regression	Spoken language identification	64.32	61.95	63.55	64.23	63.55	64.23	64.23	64.23
Naïve Bayes	Spoken language identification	50.25	49.75	51.20	52.32	51.78	51.20	51.20	51.20
SVM	Common voice Kaggle	82.88	83.32	82.65	75.95	75.55	76.95	76.95	76.95
Random forest classifier	Common voice Kaggle	72.42	71.90	71.50	66.32	67.24	67.23	67.23	67.23
VGG16	Mozilla common voice	81.30	80.21	81.05	81.25	80.52	79.83	80.02	80.02
ResNet50	Spoken language identification	86.30	80.20	84.32	85.36	84.23	84.65	82.75	82.75
CapsNet [20]	Spoken language identification	91.80	88.76	90.72	88	89	89	89	89
2D ConvNet bidirectional GRU [14]	Spoken language identification	68.85	65.23	67.82	68	66	66	66	66
Acoustic model [25]	Spoken language identification	75.69	73.23	74.23	75	75	75	75	75
CNN LSTM [19]	Spoken language identification	83.25	80.52	82.45	83	82	82	82	82
Logistic regression I-vector [26]	Mozilla common voice	84.30	80.23	82.52	78.95	80.05	82.36	82.36	82.36
LSTM-CNN [29]	Common voice Kaggle	70.21	68.33	69.96	67.23	68.33	69.54	69.54	69.54