Research Article
SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
Table 4
Statistics of the KSUCCA corpus [
23].
| | Genre | Number of texts | Number of words | Percentage (%) |
| | Religion | 150 | 23645087 | 46.73 | | Linguistics | 56 | 7093966 | 14.02 | | Literature | 104 | 7224504 | 14.28 | | Science | 42 | 6429133 | 12.71 | | Sociology | 32 | 2709774 | 5.36 | | Biography | 26 | 3499948 | 6.92 | | Total | 410 | 50602412 | 100 |
|
|