Applied Computational Intelligence and Soft Computing

Research Article

Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition

The methods in the literature, dataset used, and their performance [26–30].


Related work	Method	Dataset	Performance (%WER)

Enhancement of automatic speech recognition by deep neural networks [26]	DNN-HMM, data augmentation	The 34 hours speech of English diverse dataset	16.85%
Self-supervised speech enhancement for Arabic speech recognition in real-world environment [27]	Denoising auto encoder, HMM	The Arabic mobile parallel speech multi-dialect speech corpus	30.17%
Effect of pitch enhancement in Punjabi children’s speech recognition system under disparate acoustic conditions [28]	Pitch enhancement, DNN-HMM	The Punjabi adult/child speech dataset	10.98%∼12.24%
A hybrid speech enhancement algorithm for voice assistance application [29]	Noise suppression, HMM	The 8.5 hours English medical speech dataset (RAVDESS)	17.5%∼22.9%
Dual application of speech enhancement for automatic speech recognition [30]	RNN transducer, data augmentation	The social media English video dataset	8.3%∼13.4%