Research Article

Content Deduplication with Granularity Tweak Based on Base and Deviation for Large Text Dataset

Table 3

Pseudo document.

Pseudo document (PD)T9T10T11PD-vector space

John buys gold for Juliet{0.00, 0.65, 0.00}{0.16, 0.00, 0.00}{0.00, 0.00, 0.47}{(0.00 + 0.16 + 0.00)/3, (0.65 + 0.00 + 0.00)/3, (0.00 + 0.00 + 0.47)/3} = {0.05, 0.22, 0.16}