Research Article

Machine Reading Comprehension-Enabled Public Service Information System: A Large-Scale Dataset and Neural Network Models

Table 5

The evaluation results of pretrained models and human performance.

ModelDevelopmentTest
F1EMF1EM

Human performance92.4596.1391.8596.33

Pretrained on general-domain corpus such as Chinese Wikipedia by Google, etc.
BERT (base)76.3085.3476.4085.04
BERT-wwm (base)76.9586.4276.7585.93
ALBERT (base)75.8585.3976.2085.26
ALBERT (large)79.0588.7677.7087.79
RoBERTa-wwm (base)78.7586.6678.3085.79
RoBERTa-wwm (large)79.3588.3878.6087.43

Continually pretrained on Chinese public service corpus by us
Our BERT (base)81.8089.4479.3087.70
Our BERT-wwm (base)81.6089.2579.6588.20
Our ALBERT (base)81.0089.4678.4087.45
Our ALBERT (large)80.9089.9079.6088.10
Our RoBERTa-wwm (base)82.6590.0579.8588.30
Our RoBERTa-wwm (large)83.1090.7980.4588.51