Research Article

A Voice Cloning Method Based on the Improved HiFi-GAN Model

Table 1

Detailed parameters of TDNN.

LayerLayer contextTotal contextInput × output

Frame 1[t − 2, t + 2]5100 × 512
Frame 2{t − 2, t, t + 2}91536 × 512
Frame 3{t − 3, t, t + 3}151536 × 512
Frame 4{t}15512 × 512
Frame 5{t}15512 × 1500
Sats pooling[0, T)T1500T × 3000
Segment 1{0}T3000 × 512
Segment 2{0}T512 × 512
SoftMax{0}T512 × K