Research Article
Training Method and Device of Chemical Industry Chinese Language Model Based on Knowledge Distillation
Table 2
Distillation performance with BERT base.
| | Model | Layers | Hidden | Acc (%) | F1 (%) |
| | BERT (teacher) | 6 | 768 | 94.13 | 92.52 | | DistillBILSTM | 3 | 300 | 91.45 | 90.21 | | BERT PKD | 3 | 768 | 92.87 | 90.66 | | DistillBERT [36] | 3 | 768 | 91.77 | 89.63 | | BERT-of-Theseus | 3 | 768 | 93.43 | 91.14 | | BERT-EMD [37] | 3 | 768 | 93.77 | 91.34 | | BiLSTM-KD | 3 | 200 | 93.13 | 91.07 |
|
|