Research Article
ASLNet: An Encoder-Decoder Architecture for Audio Splicing Detection and Localization
Table 2
The detection results of the ASLNet on four datasets under different thresholds
and acoustic features.
| Feature | Threshold | Spliced at the end | Spliced at the middle | Dataset | | | | Dataset | | | |
| MFCC | = 0.5 | ENSet2s | 0.8742 | 0.9295 | 0.9077 | ENSet3s | 0.8949 | 0.9904 | 0.9740 | CNSet2s | 0.9698 | 0.9903 | 0.9833 | CNSet3s | 0.9938 | 0.9979 | 0.9965 | = 0.6 | ENSet2s | 0.8778 | 0.9269 | 0.9075 | ENSet3s | 0.9022 | 0.9894 | 0.9745 | CNSet2s | 0.9712 | 0.9894 | 0.9831 | CNSet3s | 0.9943 | 0.9977 | 0.9965 | = 0.7 | ENSet2s | 0.8818 | 0.9219 | 0.9061 | ENSet3s | 0.9108 | 0.9876 | 0.9745 | CNSet2s | 0.9725 | 0.9884 | 0.9830 | CNSet3s | 0.9950 | 0.9972 | 0.9964 | = 0.8 | ENSet2s | 0.8869 | 0.9206 | 0.9073 | ENSet3s | 0.9132 | 0.9869 | 0.9742 | CNSet2s | 0.9743 | 0.9870 | 0.9827 | CNSet3s | 0.9955 | 0.9966 | 0.9962 |
| LFCC | = 0.5 | ENSet2s | 0.5359 | 0.7588 | 0.6708 | ENSet3s | 0.0012 | 1.0000 | 0.8289 | CNSet2s | 0.9129 | 0.9575 | 0.9422 | CNSet3s | 0.9730 | 0.9768 | 0.9755 | = 0.6 | ENSet2s | 0.5571 | 0.7479 | 0.6726 | ENSet3s | 0.0037 | 0.9997 | 0.8291 | CNSet2s | 0.9155 | 0.9563 | 0.9424 | CNSet3s | 0.9750 | 0.9750 | 0.9750 | = 0.7 | ENSet2s | 0.5818 | 0.7357 | 0.6750 | ENSet3s | 0.0183 | 0.9982 | 0.8304 | CNSet2s | 0.9187 | 0.9554 | 0.9429 | CNSet3s | 0.9758 | 0.9738 | 0.9745 | = 0.8 | ENSet2s | 0.6030 | 0.7061 | 0.6654 | ENSet3s | 0.0452 | 0.9912 | 0.8291 | CNSet2s | 0.9219 | 0.9543 | 0.9432 | CNSet3s | 0.9778 | 0.9714 | 0.9736 |
|
|