Research Article
Sentence Classification Using N-Grams in Urdu Language Text
| | Sr. No. | Dataset |
| | 1 | Enabling Minority Language Engineering (EMILLE) (only 200000 tokens) [18] | | 2 | Becker-Riaz corpus (only 50000 tokens) [19] | | 3 | Computing Research Laboratory (CRL) annotated corpus (only 55,000 tokens are publicly available data corpora) [20] | | 4 | International Joint Conference on Natural Language Processing (IJCNLP) workshop corpus (only 58252 tokens) | | 5 | Urdu Named Entity Recognition (UNER) [4] | | 6 | Corpus of 705 sentences [21] | | 7 | Corpus of BBC Urdu, Daily Jang [22] | | 8 | corpus of 19.3 million words [23] | | 9 | COUNTER, Naïve, NPUU [24, 25] | | 10 | DSL Urdu news [26] |
|
|