Research Article
[Retracted] Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets
Table 6
The answer length distributions of the proposed multiple-choice cloze datasets.
(a) |
| Short cloze dataset | Tokens # | Train # | PP (%) | Dev # | PP (%) | Test # | PP (%) |
| 7 | 4428 | 10.93% | 974 | 10.82% | 1016 | 11.29% | 8 | 4630 | 11.43% | 1022 | 11.36% | 1037 | 11.52% | 9 | 5020 | 12.40% | 1166 | 12.96% | 1077 | 11.97% | 10 | 5260 | 12.99% | 1212 | 13.47% | 1154 | 12.82% | 11 | 5389 | 13.31% | 1149 | 12.77% | 1253 | 13.92% | 12 | 5427 | 13.40% | 1205 | 13.39% | 1200 | 13.33% | 13 | 5308 | 13.11% | 1171 | 13.01% | 1136 | 12.62% | 14 | 5038 | 12.44% | 1101 | 12.23% | 1127 | 12.52% | Total | 40,500 | 100.00% | 9000 | 100.00% | 9000 | 100.00% |
|
|
(b) |
| Long cloze dataset | Tokens # | Train # | PP (%) | Dev # | PP (%) | Test # | PP (%) |
| 17 | 5209 | 12.86% | 1132 | 12.58% | 1174 | 13.04% | 18 | 4919 | 12.15% | 1114 | 12.38% | 1072 | 11.91% | 19 | 4367 | 10.78% | 999 | 11.10% | 982 | 10.91% | 20 | 3983 | 9.83% | 891 | 9.90% | 875 | 9.72% | 21 | 3637 | 8.98% | 762 | 8.47% | 770 | 8.56% | 22 | 3187 | 7.87% | 714 | 7.93% | 715 | 7.94% | 23 | 2980 | 7.36% | 655 | 7.28% | 602 | 6.69% | 24 | 2627 | 6.49% | 576 | 6.40% | 590 | 6.56% | 25 | 2275 | 5.62% | 546 | 6.07% | 570 | 6.33% | 26 | 2162 | 5.34% | 467 | 5.19% | 504 | 5.60% | 27 | 1885 | 4.65% | 408 | 4.53% | 427 | 4.74% | 28 | 1776 | 4.39% | 385 | 4.28% | 396 | 4.40% | 29 | 1493 | 3.69% | 351 | 3.90% | 323 | 3.59% | Total | 40,500 | 100.00% | 9000 | 100.00% | 9000 | 100.00% |
|
|