Research Article
TCMNER and PubMed: A Novel Chinese Character-Level-Based Model and a Dataset for TCM Named Entity Recognition
Table 6
The ablation study for all comparison models.
| F1-score (%) | Clinical manifestation | Syndrome | Disease | Treatment law | Herb | Total | Publications (test) | Medical records | Publications (test) | Medical records | Publications (test) | Medical records | Publications (test) | Medical records | Publications (test) | Medical records | Publications (test) | Medical records |
| BERT-BiLSTM-CRF | 75.1 | 55.7 | 76.9 | 63.2 | 73.6 | 59.6 | 78.9 | 68.1 | 76 | 79.2 | 74.2 | 71.9 | BERT-BiLSTM-CRF-c | 79.6 | 60.3 | 80.3 | 70.4 | 78.2 | 63.4 | 81.5 | 73.3 | 81.2 | 82.6 | 79.5 | 76.4 | BERT | 94 | 76.2 | 91.2 | 73.7 | 86.7 | 76.6 | 94.3 | 87.1 | 93.4 | 93.4 | 89.3 | 86.4 | BERT-c | 95.2 | 79.4 | 94.3 | 76.8 | 88.4 | 76.6 | 96.5 | 90.2 | 94.4 | 95.2 | 90.8 | 88.6 | BERT-LSTM | 89.3 | 74.3 | 90.3 | 71.2 | 88 | 74.2 | 90.6 | 85.4 | 88.4 | 92.5 | 86.2 | 85.1 | BERT-LSTM-c | 91.3 | 76.2 | 92.3 | 73.5 | 89.1 | 74.2 | 92.7 | 87.3 | 90.1 | 93.6 | 88.6 | 87.3 | BERT-BiLSTM | 94 | 77.3 | 93.5 | 78.4 | 93 | 75.7 | 94.1 | 83.2 | 93.3 | 92.9 | 90.3 | 86.8 | BERT-BiLSTM-c | 94.9 | 78.3 | 94.1 | 79.6 | 93.5 | 75.7 | 94.9 | 84 | 94.3 | 93.9 | 92.3 | 87.1 | RoBERTa | 96.5 | 81.5 | 94.2 | 80.3 | 96.9 | 80.7 | 96.2 | 95.3 | 96.3 | 95 | 92.9 | 89.5 | RoBERTa-c | 98.3 | 83.9 | 97.1 | 85.5 | 98.6 | 81.8 | 98.8 | 94.2 | 98.7 | 97.9 | 94.6 | 91.3 | RoBERTa-LSTM | 94.2 | 78.1 | 92.8 | 77.5 | 94.5 | 78.6 | 94.2 | 88.3 | 93.5 | 94.5 | 90.3 | 87.8 | RoBERTa-LSTM-c | 95.1 | 79.5 | 94.2 | 79.6 | 95.7 | 78.6 | 96 | 89.6 | 94.8 | 96.1 | 91.7 | 89.6 | RoBERTa-BiLSTM | 94.5 | 78.7 | 95.1 | 79.3 | 95.4 | 79.6 | 94.7 | 88.9 | 94.3 | 94.7 | 90.7 | 88.2 | RoBERTa-BiLSTM-c | 94.9 | 79.1 | 95.9 | 80.4 | 96.1 | 79.6 | 95.8 | 89.7 | 95.3 | 95.2 | 92.3 | 88.5 |
|
|
The model with ā-cā suffix means this model has the word-character integrated self-attention module.
|