Research Article
SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
Table 5
Statistics of Arabic Gigaword Third Edition corpus.
| Source | Files | DOCs | Words |
| Agence France Presse | 152 | 147612 | 798436 | Assabah | 28 | 6587 | 15410 | Al Hayat | 142 | 171502 | 378353 | An Nahar | 134 | 193732 | 449340 | Ummah Press | 24 | 1201 | 4645 | Xinhua News Agency | 67 | 56165 | 348551 | Total | 547 | 576799 | 1994735 |
|
|