Research Article
Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
Table 1
The methods in the literature, dataset used, and their performance [
26–
30].
| Related work | Method | Dataset | Performance (%WER) |
| Enhancement of automatic speech recognition by deep neural networks [26] | DNN-HMM, data augmentation | The 34 hours speech of English diverse dataset | 16.85% | Self-supervised speech enhancement for Arabic speech recognition in real-world environment [27] | Denoising auto encoder, HMM | The Arabic mobile parallel speech multi-dialect speech corpus | 30.17% | Effect of pitch enhancement in Punjabi children’s speech recognition system under disparate acoustic conditions [28] | Pitch enhancement, DNN-HMM | The Punjabi adult/child speech dataset | 10.98%∼12.24% | A hybrid speech enhancement algorithm for voice assistance application [29] | Noise suppression, HMM | The 8.5 hours English medical speech dataset (RAVDESS) | 17.5%∼22.9% | Dual application of speech enhancement for automatic speech recognition [30] | RNN transducer, data augmentation | The social media English video dataset | 8.3%∼13.4% |
|
|