Research Article
Content Deduplication with Granularity Tweak Based on Base and Deviation for Large Text Dataset
| Pseudo document (PD) | T9 | T10 | T11 | PD-vector space |
| John buys gold for Juliet | {0.00, 0.65, 0.00} | {0.16, 0.00, 0.00} | {0.00, 0.00, 0.47} | {(0.00 + 0.16 + 0.00)/3, (0.65 + 0.00 + 0.00)/3, (0.00 + 0.00 + 0.47)/3} = {0.05, 0.22, 0.16} |
|
|