Research Article

A Voice Cloning Method Based on the Improved HiFi-GAN Model

Table 7

SMOS of cloning speech similarity of different models.

MetricSettingsLibriSpeechVCTKTHchs-30

SMOS (CI)Multispeaker TTS3.56    0.073.18    0.063.25    0.08
Multispeaker TTS + x-vector3.91    0.063.44    0.073.59    0.06
WaveGlow + d-vector3.55    0.093.  0.093.32    0.07
WaveGlow + x-vector3.89    0.083.47    0.093.64    0.05
HiFi-GAN + d-vector3.82    0.053.38    0.073.43    0.09
HiFi-GAN + x-vector4.15    0.073.61    0.083.68    0.08
Improved HiFi-GAN + d-vector3.99    0.103.52    0.063.61    0.05
Improved HiFi-GAN + x-vector4.23    0.063.80    0.083.84    0.07