Research Article
Robust Frame Duplication Detection for Degraded Videos
Figure 1
An example demonstrating the substantial difference between a frame and its copy. Such difference is caused only by the lossy encoding process itself. The video is H.264 encoded. (a, b) A pair of source and target frames. Although they are visually identical, nonnegligible error between them has been introduced during the encoding process. (c) The histogram of pixel-wise absolute difference. We can see that the error can even be as large as 30, and the number of occurrences of the errors which are larger than or equal to 10 is over 700,000 (the resolution of the video is ). To further demonstrate the impact of such difference, we build a 500-word vocabulary by k-means clustering 7500 dense SIFT features extracted from 72 randomly selected images, and the vocabulary is lexicographically sorted so that neighbouring visual words in the vocabulary are also nearer in the feature space. We then extract dense SIFT features for each pixel in (a) and (b) and map the features to the indices of the visual words so that we get two visual word index maps (d, e) and the absolute difference map between them (f). The considerable amount of bright spots in (f) indicates that the lossy compression substantially changed the local structure and hence local feature.
(a) |
(b) |
(c) |
(d) |
(e) |
(f) |