Research Article

A Voice Cloning Method Based on the Improved HiFi-GAN Model

Table 6

MOS of cloning speech naturalness of different models.

MetricSettingsLibriSpeechVCTKTHchs-30

MOS (CI)Multispeaker TTS3.93    0.063.57    0.073.64    0.05
Multispeaker TTS + x-vector4.02    0.083.72    0.093.78    0.07
WaveGlow + d-vector3.85    0.063.49    0.083.47    0.06
WaveGlow + x-vector3.93    0.073.74    0.083.69    0.08
HiFi-GAN + d-vector4.21    0.103.86    0.063.92    0.07
HiFi-GAN + x-vector4.30    0.074.15    0.074.13    0.09
Improved HiFi-GAN + d-vector4.28    0.094.06    0.054.11    0.04
Improved HiFi-GAN + x-vector4.36    0.064.28    0.084.28    0.06