Research Article

TCMNER and PubMed: A Novel Chinese Character-Level-Based Model and a Dataset for TCM Named Entity Recognition

Table 6

The ablation study for all comparison models.

F1-score (%)Clinical manifestationSyndromeDiseaseTreatment lawHerbTotal
Publications (test)Medical recordsPublications (test)Medical recordsPublications (test)Medical recordsPublications (test)Medical recordsPublications (test)Medical recordsPublications (test)Medical records

BERT-BiLSTM-CRF75.155.776.963.273.659.678.968.17679.274.271.9
BERT-BiLSTM-CRF-c79.660.380.370.478.263.481.573.381.282.679.576.4
BERT9476.291.273.786.776.694.387.193.493.489.386.4
BERT-c95.279.494.376.888.476.696.590.294.495.290.888.6
BERT-LSTM89.374.390.371.28874.290.685.488.492.586.285.1
BERT-LSTM-c91.376.292.373.589.174.292.787.390.193.688.687.3
BERT-BiLSTM9477.393.578.49375.794.183.293.392.990.386.8
BERT-BiLSTM-c94.978.394.179.693.575.794.98494.393.992.387.1
RoBERTa96.581.594.280.396.980.796.295.396.39592.989.5
RoBERTa-c98.383.997.185.598.681.898.894.298.797.994.691.3
RoBERTa-LSTM94.278.192.877.594.578.694.288.393.594.590.387.8
RoBERTa-LSTM-c95.179.594.279.695.778.69689.694.896.191.789.6
RoBERTa-BiLSTM94.578.795.179.395.479.694.788.994.394.790.788.2
RoBERTa-BiLSTM-c94.979.195.980.496.179.695.889.795.395.292.388.5

The model with ā€œ-cā€ suffix means this model has the word-character integrated self-attention module.