Research Article
Effective and Fast Near Duplicate Detection via Signature-Based Compression Metrics
Algorithm 1
SigNCD duplicate detection.
| Require: document list ; similarity threshold ; number of threads ; compressor . | | Ensure: duplicate set | | () , | | () function DUPDETECT() | | () for all documents in using threads in parallel do | | () preprocessing to filter out noisy information | | () signature of | | () the length of compressed | | () end for | | () sort all in by in ascending order | | () for all in using threads in parallel do | | () if in then | | () continue | | () end if | | () | | () end for | | () return | | () end function | | () | | () function ((, , )) | | () | | () the index of boundary object of matching partition of on | | () for all in do | | () if in then | | () continue | | () end if | | () | | () if then | | () | | () | | () end if | | () end for | | () return | | () end function |
|