Research Article
A Voice Cloning Method Based on the Improved HiFi-GAN Model
Table 1
Detailed parameters of TDNN.
| Layer | Layer context | Total context | Input × output |
| Frame 1 | [t − 2, t + 2] | 5 | 100 × 512 | Frame 2 | {t − 2, t, t + 2} | 9 | 1536 × 512 | Frame 3 | {t − 3, t, t + 3} | 15 | 1536 × 512 | Frame 4 | {t} | 15 | 512 × 512 | Frame 5 | {t} | 15 | 512 × 1500 | Sats pooling | [0, T) | T | 1500T × 3000 | Segment 1 | {0} | T | 3000 × 512 | Segment 2 | {0} | T | 512 × 512 | SoftMax | {0} | T | 512 × K |
|
|