Research Article
[Retracted] Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets
Table 5
The answer length distributions of the proposed span extraction datasets.
(a) |
| Short span dataset | Tokens # | Train # | PP (%) | Dev # | PP (%) | Test # | PP (%) |
| 4 | 16,171 | 51.52% | 3344 | 49.37% | 4147 | 49.91% | 5 | 8566 | 27.29% | 1828 | 26.99% | 2230 | 26.84% | 6 | 6653 | 21.19% | 1602 | 23.65% | 1932 | 23.25% | Total | 31,390 | 100.00% | 6774 | 100.00% | 8309 | 100.00% |
|
|
(b) |
| Long span dataset | Tokens # | Train # | PP (%) | Dev # | PP (%) | Test # | PP (%) |
| 7 | 5040 | 41.82% | 819 | 34.47% | 2071 | 35.63% | 8 | 4192 | 34.78% | 980 | 41.25% | 2390 | 41.12% | 9 | 2821 | 23.40% | 577 | 24.28% | 1351 | 23.25% | Total | 12,053 | 100.00% | 2376 | 100.00% | 5812 | 100.00% |
|
|