Research Article
A Voice Cloning Method Based on the Improved HiFi-GAN Model
Table 3
The key training parameters of the feature prediction network.
| Dimensions of the speaker embedding vector | 256 |
| Silence duration (s) | 0.4 | Utterance duration (s) | 16 | Mel spectrum channel number | 80 | Initial learning rate | 0.003 | Final learning rate | 0.00005 | Spectral window length (ms) | 50 | Spectral window shift (ms) | 12.5 |
|
|