Research Article
Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
Table 4
The dataset of Indonesian audio after first validation (approx. size 10,000 utterances).
| Category | Utterances before validation | Utterances after validation | Number of utterances | Total duration (hours) | Number of utterances | Total duration (hours) |
| Autos and vehicles | 2,234 | 1.3289 | 29 | 0.0214 | Comedy | 6,017 | 4.0768 | 71 | 0.0498 | Education | 47,477 | 38.4886 | 5,973 | 4.5254 | Entertainment | 39,155 | 25.2972 | 470 | 0.2994 | Film and animation | 6,475 | 4.5591 | 75 | 0.0567 | Gaming | 124 | 0.0987 | 1 | 0.0000 | Howto and style | 28,726 | 41.8189 | 450 | 0.5933 | Music | 1,409 | 2.5783 | 2 | 0.0011 | News and politics | 19,295 | 16.8386 | 303 | 0.2502 | People and blogs | 32,205 | 22.7193 | 2,432 | 1.7630 | Pets and animals | 57 | 0.0433 | 1 | 0.0000 | Science and technology | 31,492 | 22.9787 | 515 | 0.3849 | Sports | 137 | 0.1131 | 1 | 0.0000 | Travel and events | 78 | 0.0633 | 1 | 0.0008 | Uncategorized | 410 | 0.2515 | 9 | 0.0069 | Total | 215,291 | 181.2543 | 10,333 | 7.9529 |
|
|