(1) Vectors based on the occurrence of a word within a corpus and in the document are counted (2) Vector is proportional to the count of a word in a document and inverse to its count in other documents (3) Reducing the importance of common words frequently occurring, e.g., “while,” “but,” “the,” and “is” (4) Computing similarity is easy
(1) The similarity is merely based on the frequency of the words neglecting the semantic similarity (2) The size of a vector is large (3) Co-occurrence of words in a document is not recorded (4) Vectors are sparse (5) Synonyms are not considered (6) Polysemy words have a single vector. For example, apple is a fruit and Apple is a company; both have the same vector representation
Count based
2
Global Vectors (GloVe) [30], co-occurrence matrix [29]
(1) It is a hybrid method using a statistical matrix with machine learning (2) Records the appearance of a set of words in a corpus (3) Semantic similarity between King and Queen (4) Dimensionality reduction reduces the dimensions while producing more accurate vectors
(1) Costly in terms of memory, for recording co-occurrences of words
(1) Word analogies and word similarities are stimulated (2) Measures likelihoods of wordsxxxx (3) “King-man + woman = Queen,” which is a great feature of word embedding (4) Vectors can infer “king: man as queen: woman” (5) Input words mapped to target words (6) Probabilistic methods generally perform superior to deterministic methods [32] (7) Comparatively, small memory is consumed
(1) Training becomes difficult with the large size of the vocabulary (2) Polysemy words have an aggregated vector representation provided in CBOW, whereas in Skip-gram, they keep separate vectors