Security and Communication Networks

Research Article

ASLNet: An Encoder-Decoder Architecture for Audio Splicing Detection and Localization

Table 2

The detection results of the ASLNet on four datasets under different thresholds and acoustic features.


Feature	Threshold	Spliced at the end				Spliced at the middle
Feature	Threshold	Dataset				Dataset

MFCC	= 0.5	ENSet2s	0.8742	0.9295	0.9077	ENSet3s	0.8949	0.9904	0.9740
	= 0.5	CNSet2s	0.9698	0.9903	0.9833	CNSet3s	0.9938	0.9979	0.9965
	= 0.6	ENSet2s	0.8778	0.9269	0.9075	ENSet3s	0.9022	0.9894	0.9745
	= 0.6	CNSet2s	0.9712	0.9894	0.9831	CNSet3s	0.9943	0.9977	0.9965
	= 0.7	ENSet2s	0.8818	0.9219	0.9061	ENSet3s	0.9108	0.9876	0.9745
	= 0.7	CNSet2s	0.9725	0.9884	0.9830	CNSet3s	0.9950	0.9972	0.9964
	= 0.8	ENSet2s	0.8869	0.9206	0.9073	ENSet3s	0.9132	0.9869	0.9742
	= 0.8	CNSet2s	0.9743	0.9870	0.9827	CNSet3s	0.9955	0.9966	0.9962

LFCC	= 0.5	ENSet2s	0.5359	0.7588	0.6708	ENSet3s	0.0012	1.0000	0.8289
	= 0.5	CNSet2s	0.9129	0.9575	0.9422	CNSet3s	0.9730	0.9768	0.9755
	= 0.6	ENSet2s	0.5571	0.7479	0.6726	ENSet3s	0.0037	0.9997	0.8291
	= 0.6	CNSet2s	0.9155	0.9563	0.9424	CNSet3s	0.9750	0.9750	0.9750
	= 0.7	ENSet2s	0.5818	0.7357	0.6750	ENSet3s	0.0183	0.9982	0.8304
	= 0.7	CNSet2s	0.9187	0.9554	0.9429	CNSet3s	0.9758	0.9738	0.9745
	= 0.8	ENSet2s	0.6030	0.7061	0.6654	ENSet3s	0.0452	0.9912	0.8291
	= 0.8	CNSet2s	0.9219	0.9543	0.9432	CNSet3s	0.9778	0.9714	0.9736