Wireless Communications and Mobile Computing

Research Article

[Retracted] Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Table 6

The answer length distributions of the proposed multiple-choice cloze datasets.

(a)


Short cloze dataset
Tokens #	Train #	PP (%)	Dev #	PP (%)	Test #	PP (%)

7	4428	10.93%	974	10.82%	1016	11.29%
8	4630	11.43%	1022	11.36%	1037	11.52%
9	5020	12.40%	1166	12.96%	1077	11.97%
10	5260	12.99%	1212	13.47%	1154	12.82%
11	5389	13.31%	1149	12.77%	1253	13.92%
12	5427	13.40%	1205	13.39%	1200	13.33%
13	5308	13.11%	1171	13.01%	1136	12.62%
14	5038	12.44%	1101	12.23%	1127	12.52%
Total	40,500	100.00%	9000	100.00%	9000	100.00%

(b)


Long cloze dataset
Tokens #	Train #	PP (%)	Dev #	PP (%)	Test #	PP (%)

17	5209	12.86%	1132	12.58%	1174	13.04%
18	4919	12.15%	1114	12.38%	1072	11.91%
19	4367	10.78%	999	11.10%	982	10.91%
20	3983	9.83%	891	9.90%	875	9.72%
21	3637	8.98%	762	8.47%	770	8.56%
22	3187	7.87%	714	7.93%	715	7.94%
23	2980	7.36%	655	7.28%	602	6.69%
24	2627	6.49%	576	6.40%	590	6.56%
25	2275	5.62%	546	6.07%	570	6.33%
26	2162	5.34%	467	5.19%	504	5.60%
27	1885	4.65%	408	4.53%	427	4.74%
28	1776	4.39%	385	4.28%	396	4.40%
29	1493	3.69%	351	3.90%	323	3.59%
Total	40,500	100.00%	9000	100.00%	9000	100.00%