Research Article

iSentenizer- : Multilingual Sentence Boundary Detection Model

Table 5

Information of Europarl corpus.

LanguageSentencesTokens
Training dataTest Data

Danish30,3433,375917,231
German29,8543,319890,176
English29,7743,309949,716
Spanish33,8693,7651,082,826
Dutch29,6043,389688,018
French29,8873,3211,098,724
Italian27,5895,067929,042
Portuguese28,9672,777947,086
Greek27,6873,077888,321
Finnish29,5043,309687,804
Swedish26,6492,962765,795