Research Article
Content Deduplication with Granularity Tweak Based on Base and Deviation for Large Text Dataset
Table 1
Documents and bag of words.
| ||||||||||||||||||||||||||
Bag of words {“arrived: T1,” “cats: T2,” “dagger: T3,” “damaged: T4,” “died: T5,” “eat: T6,” “fishT7,” “glitters: T8,” “gold: T9,” “John: T10,” “Juliet: T11,” “makes: T12,” “money: T13,” “Romeo: T14,” and “shipment: T15,” “truck: T16”}. |