Research Article

A Voice Cloning Method Based on the Improved HiFi-GAN Model

Table 3

The key training parameters of the feature prediction network.

Dimensions of the speaker embedding vector256

Silence duration (s)0.4
Utterance duration (s)16
Mel spectrum channel number80
Initial learning rate0.003
Final learning rate0.00005
Spectral window length (ms)50
Spectral window shift (ms)12.5