Research Article
Sentence Classification Using N-Grams in Urdu Language Text
| Sr. No. | Dataset |
| 1 | Enabling Minority Language Engineering (EMILLE) (only 200000 tokens) [18] | 2 | Becker-Riaz corpus (only 50000 tokens) [19] | 3 | Computing Research Laboratory (CRL) annotated corpus (only 55,000 tokens are publicly available data corpora) [20] | 4 | International Joint Conference on Natural Language Processing (IJCNLP) workshop corpus (only 58252 tokens) | 5 | Urdu Named Entity Recognition (UNER) [4] | 6 | Corpus of 705 sentences [21] | 7 | Corpus of BBC Urdu, Daily Jang [22] | 8 | corpus of 19.3 million words [23] | 9 | COUNTER, Naïve, NPUU [24, 25] | 10 | DSL Urdu news [26] |
|
|