|
S/N | Reference | Database | Approach used | Recognized emotions | Results |
Name | Language |
|
1 | Koolagudi et al. [79] | IITKGP-SESC | Telugu | SVM and GMM with energy and pitch parameters | Happy, anger, fear, disgust, sarcastic, sad, neutral, surprise | 63.75% average accuracy obtained |
|
2 | Sultana et al. [3] | SUBESCO and RAVDESS | Bangla and English | The system integrates a DCNN and a BLSTM network with a TDF layer | Happy, calm, sad, surprise, fearful, disgust, angry, neutral | For the SUBESCO and RAVDESS datasets, the proposed model has achieved weighted accuracies of 86.9% and 82.7%, respectively |
|
3 | Kumar and Yadav [80] | IITKGP-SEHSC | Hindi | Deep LSTM with GMFCC and DMFCC features | Happy, fear, angry, sad, neutral | The proposed framework gives average accuracy of 91.2% for male speech and 87.6% for female speech |
|
4 | Mohanty and Swain [15] | Oriya emotional speech database | Oriya | Fuzzy K-means | Anger, sadness, astonish, fear, happiness, neutral | 65.16% recognition rate by incorporating mean pitch, first two formants, jitter, shimmer, and energy as feature vectors |
|
5 | Samantaray et al. [48] | MESDNEI | Assamese | SVM with dynamic, quality, derived, and prosodic features | Happy, anger, fear, disgust, surprise, sad, neutral | 82.26% average accuracy rate for speaker-independent case |
|
6 | Bhavan et al. [81] | EmoDB, RAVDESS and IITKGP-SEHSC | German, English and Hindi | Bagged ensemble of SVM using MFCCs, spectral, and centroids | Happy, sad, calm, angry, surprise, fear, disgust, neutral | Obtained accuracy EmoDB: 92.45%, RAVDESS: 75.69% and IITKGP-SEHSC: 84.11% |
|
7 | Swain et al. [82] | Self-created database using utterances from two native languages of Odisha: Cuttacki and Sambalpuri | Oriya | SVM using MFCC as feature vector | Happiness, fear, anger, disgust, sadness, surprise, neutral | 82.14% recognition accuracy for SVM classifier |
|
8 | Zaheer et al. [30] | SEMOUR+ | Urdu | Ensemble classifier, CNN combined with VGG-19 model | Anger, disgust, happiness, surprise, boredom, sadness, fearful, neutral | The proposed model achieved 56% speaker-independent recognition rate |
|
9 | Wankhade et al. [47] | Speech emotional database containing dialogues from different bollywood movies | Hindi | SVM classifier with MFCC and MEDC feature set | Angry, happy, sad, neutral | 71.66% recognition rate using SVM classifier |
|
10 | Ali et al. [83] | Self-created speech emotional corpus recorded in 5 regional languages of Pakistan | Urdu, Sindhi, Pashto, Punjabi, and Balochi | Learning classifiers (adaboostM1, J48, classification via regression, decision stump) with prosodic features | Happiness, sad, anger, neutral | 40% classification accuracy with pitch feature |
|
11 | Ancilin and Milton [84] | Urdu | Urdu | SVM classifier with mel frequency magnitude coefficient (MFMC) | Happy, sad, anger, neutral | 95.25% emotion recognition rate using MFMC |
|
12 | Farhad et al. [85] | Urdu | Urdu | Neural network, random forest and meta iterative classifiers with pitch and MFCC features | Happy, sad, angry | With an accuracy of 78.75%, random forest outperforms other classifiers |
|
13 | Darekar and Dhande [86] | Marathi database | Marathi | Adaptive ANN combining cepstral, non-negative matrix factorization (NMF) and pitch features | Happy, sad, angry, fear, neutral, surprised | Proposed model obtains 80% accuracy combining the 3 features |
|
14 | Koolagudi et al. [87] | IITKGP-SESC | Telugu | SVM and GMM model with epoch parameters were used | Happy, anger, fear, sadness, disgust, neutral | Average recognition rates are 58% and 61% for SVM and GMM, respectively |
|
15 | Kandali et al. [49] | Self-created acted emotional speech database by 27 speakers | Assamese | GMM classifier with MFCC features | Happy, sad, disgust, fear, angry, surprise, neutral | Highest mean classification score is 76.5% |
|
16 | Dhar and Guha [88] | Abeg: self-collected Bangla emotional speech dataset | Bangla | Logistic regression model with MFCC and LPC features | Happy, angry, neutral | Proposed model achieved 92% accuracy combining MFCC and LPC features |
|
17 | Jacob [89] | Hindi emotional speech database containing 2240 wav files collected from 10 speakers | Hindi | ANN model with jitter and shimmer features | Happy, sad, anger, fear, surprise, disgust, neutral | 83.3% overall accuracy obtained combining jitter and shimmer features |
|
18 | Fernandes and Mannepalli [90] | Acted emotional speech database containing 1400 utterances by 10 actors | Tamil | LSTM and BiLSTM with MFCC, MFCC delta, spectral kurtosis, bark spectrum, and spectral skewness features | Happy, anger, sad, fear, boredom, disgust, neutral | 84% accuracy rate obtained using LSTM and BiLSTM with dropout layers |
|
19 | Rajisha et al. [91] | Acted emotional dataset created by the authors | Malayalam | ANN and SVM classifier with MFCC, short-time energy, and pitch features | Happy, anger, sad, neutral | 88.4% recognition rate obtained using ANN and 78.2% with SVM |
|
20 | Kannadaguli and Bhat [92] | Self-created database containing 2800 emotional recordings | Kannada | Bayesian and HMM model with MFCC feature | Happy, excited, angry, sad | Average emotion error rate of 25.5% for Bayesian and 0.2% for HMM approach |
|