Research Article

[Retracted] Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Table 6

The answer length distributions of the proposed multiple-choice cloze datasets.
(a)

Short cloze dataset
Tokens #Train #PP (%)Dev #PP (%)Test #PP (%)

7442810.93%97410.82%101611.29%
8463011.43%102211.36%103711.52%
9502012.40%116612.96%107711.97%
10526012.99%121213.47%115412.82%
11538913.31%114912.77%125313.92%
12542713.40%120513.39%120013.33%
13530813.11%117113.01%113612.62%
14503812.44%110112.23%112712.52%
Total40,500100.00%9000100.00%9000100.00%

(b)

Long cloze dataset
Tokens #Train #PP (%)Dev #PP (%)Test #PP (%)

17520912.86%113212.58%117413.04%
18491912.15%111412.38%107211.91%
19436710.78%99911.10%98210.91%
2039839.83%8919.90%8759.72%
2136378.98%7628.47%7708.56%
2231877.87%7147.93%7157.94%
2329807.36%6557.28%6026.69%
2426276.49%5766.40%5906.56%
2522755.62%5466.07%5706.33%
2621625.34%4675.19%5045.60%
2718854.65%4084.53%4274.74%
2817764.39%3854.28%3964.40%
2914933.69%3513.90%3233.59%
Total40,500100.00%9000100.00%9000100.00%