Research Article
A Data-Driven Model for Automated Chinese Word Segmentation and POS Tagging
Table 5
Presentation of the four datasets.
| Dataset | Data segmentation | Number of words (K) | Number of sentences (K) |
| MSR | Training set | 494 | 18 | Test set | 8.0 | 348 | CTB7 | Training set | 78 | 31 | Test set | 245 | 10 | CTB5 | Training set | 2131 | 78 | Test set | 107 | 4 |
|
|