Research Article
Multitask Learning with Local Attention for Tibetan Speech Recognition
Table 4
Speaker ID recognition accuracy (%) of two-task models.
| Architecture | Model | Lhasa-Ü-Tsang | Changdu-Kham | Amdo Pastoral |
| SpeakerID model | 67.75 | 93.13 | 95.31 | WaveNet-CTC with speaker ID | S-S1 | 68.32 | 92.85 | 97.48 | S-S2 | 71.15 | 95.23 | 96.12 |
| Attention (5)-WaveNet-CTC | S-S1 | 0 | 0 | 0 | S-S2 | 60.64 | 77.38 | 85.85 |
| WaveNet-Attention (5)-CTC | S-S1 | 70.35 | 92.85 | 97.48 | S-S2 | 69.40 | 100 | 96.70 |
|
|