Research Article

[Retracted] Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Table 5

The answer length distributions of the proposed span extraction datasets.
(a)

Short span dataset
Tokens #Train #PP (%)Dev #PP (%)Test #PP (%)

416,17151.52%334449.37%414749.91%
5856627.29%182826.99%223026.84%
6665321.19%160223.65%193223.25%
Total31,390100.00%6774100.00%8309100.00%

(b)

Long span dataset
Tokens #Train #PP (%)Dev #PP (%)Test #PP (%)

7504041.82%81934.47%207135.63%
8419234.78%98041.25%239041.12%
9282123.40%57724.28%135123.25%
Total12,053100.00%2376100.00%5812100.00%