Research Article
SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
Table 4
Statistics of the KSUCCA corpus [
23].
| Genre | Number of texts | Number of words | Percentage (%) |
| Religion | 150 | 23645087 | 46.73 | Linguistics | 56 | 7093966 | 14.02 | Literature | 104 | 7224504 | 14.28 | Science | 42 | 6429133 | 12.71 | Sociology | 32 | 2709774 | 5.36 | Biography | 26 | 3499948 | 6.92 | Total | 410 | 50602412 | 100 |
|
|