Research Article

PERLEX: A Bilingual Persian-English Gold Dataset for Relation Extraction

Table 3

PERLEX dataset statistics.

DatasetPartitionSentencesWordsEntitiesEntity wordsAverage sentence length

PERLEXTrain8,000161,00116,00019,86320.13
PERLEXTest2,71754,6335,4346,69920.11
PERLEXTotal10,717215,63421,43427,21020.12
SemEval2010T8Train8,000150,37616,00016,72818.80
SemEval2010T8Test2,71751,1905,4345,65518.84
SemEval2010T8Total10,717201,56621,43427,37618.81