Research Article
Multitask Learning with Local Attention for Tibetan Speech Recognition
Table 3
Dialect ID recognition accuracy (%) of two-task models.
| Architecture | Model | Lhasa-Ü-Tsang | Changdu-Kham | Amdo Pastoral |
| DialectID model | 97.88 | 92.24 | 97.9 | WaveNet-CTC with dialect ID | D-S | 98.57 | 95.23 | 99.6 | S-D | 99.01 | 97.61 | 99.41 |
| Attention (5)-WaveNet-CTC | D-S | 100 | 89.28 | 94.52 | S-D | 0 | 0 | 0 |
| WaveNet-Attention (5)-CTC | D-S | 100 | 98.8 | 99.41 | S-D | 100 | 94.04 | 98.06 |
|
|