Research Article
ASLNet: An Encoder-Decoder Architecture for Audio Splicing Detection and Localization
Figure 2
The framework of ASLNet. The Encoder-Decoder architecture is based on the VGG16 network and extended by two transposed convolutions with a skip connection. The input feature of a 2-s audio clip is a fixed size of 72 64. The numbers above feature maps indicate channels and height width of the feature maps.