Research Article

Multitask Learning with Local Attention for Tibetan Speech Recognition

Table 6

Dialect ID recognition accuracy (%) of three-task models.

ArchitectureModelLhasa-Ü-TsangChangdu-KhamAmdo Pastoral

DialectID model97.8892.2497.9

WaveNet-CTC with dialect ID and speaker IDD-S-S198.0198.899.41
D-S-S299.7396.4299.61
S-D-S99.2595.2399.03

Attention (5)-WaveNet-CTCS-D-S10076.1991.27
D-S-S110090.4794.18
D-S-S210082.1493.02

WaveNet-Attention (5)-CTCS-D-S10089.2893.79
D-S-S110085.7193.79
D-S-S210095.2394.18

WaveNet-Attention (7)-CTCS-D-S085.7191.66
D-S-S1089.9893.88
D-S-S2089.2895.34

WaveNet-Attention (10)-CTCS-D-S085.7195.54
D-S-S1094.0493.99
D-S-S2000