Research Article

Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition

Table 4

The dataset of Indonesian audio after first validation (approx. size 10,000 utterances).

CategoryUtterances before validationUtterances after validation
Number of utterancesTotal duration (hours)Number of utterancesTotal duration (hours)

Autos and vehicles2,2341.3289290.0214
Comedy6,0174.0768710.0498
Education47,47738.48865,9734.5254
Entertainment39,15525.29724700.2994
Film and animation6,4754.5591750.0567
Gaming1240.098710.0000
Howto and style28,72641.81894500.5933
Music1,4092.578320.0011
News and politics19,29516.83863030.2502
People and blogs32,20522.71932,4321.7630
Pets and animals570.043310.0000
Science and technology31,49222.97875150.3849
Sports1370.113110.0000
Travel and events780.063310.0008
Uncategorized4100.251590.0069
Total215,291181.254310,3337.9529