Research Article
A Voice Cloning Method Based on the Improved HiFi-GAN Model
Table 6
MOS of cloning speech naturalness of different models.
| Metric | Settings | LibriSpeech | VCTK | THchs-30 |
| MOS (CI) | Multispeaker TTS | 3.93 0.06 | 3.57 0.07 | 3.64 0.05 | Multispeaker TTS + x-vector | 4.02 0.08 | 3.72 0.09 | 3.78 0.07 | WaveGlow + d-vector | 3.85 0.06 | 3.49 0.08 | 3.47 0.06 | WaveGlow + x-vector | 3.93 0.07 | 3.74 0.08 | 3.69 0.08 | HiFi-GAN + d-vector | 4.21 0.10 | 3.86 0.06 | 3.92 0.07 | HiFi-GAN + x-vector | 4.30 0.07 | 4.15 0.07 | 4.13 0.09 | Improved HiFi-GAN + d-vector | 4.28 0.09 | 4.06 0.05 | 4.11 0.04 | Improved HiFi-GAN + x-vector | 4.36 0.06 | 4.28 0.08 | 4.28 0.06 |
|
|