Research Article

Multitask Learning with Local Attention for Tibetan Speech Recognition

Table 7

Speaker ID recognition accuracy (%) of three-task models.

ArchitectureModelLhasa-Ü-TsangChangdu-KhamAmdo pastoral

SpeakerID model67.7593.1395.31

WaveNet-CTC with dialect ID and speaker IDS-D-S72.9198.896.12
D-S-S170.2195.2393.6
D-S-S270.3596.4296.89

Attention (5)-WaveNet-CTCS-D-S61.0883.3389.53
D-S-S162.1283.3387.01
D-S-S261.9984.5290.11

WaveNet-Attention (5)-CTCS-D-S61.9985.7192.05
D-S-S162.5382.1491.08
D-S-S261.1889.2892.44

WaveNet-Attention (7)-CTCS-D-S60.9185.7191.66
D-S-S162.0484.3192.01
D-S-S258.4986.9090.69

WaveNet-Attention (10)-CTCS-D-S58.4984.5292.05
D-S-S159.4383.3391.27
D-S-S263.4792.8597.86