Abstract

The extraction and construction of the knowledge graph related to the entity of ancient poems are helpful to excavate the connection between ancient poems, and it is of great significance to inherit the traditional Chinese culture. This paper proposes an Albert-BiLSTM-MHA-CRF model for entity extraction in ancient poems. Based on the BiLSTM-CRF model, the author introduces the Albert pretraining model and the multihead self-attention mechanism to extract character vectors and enhance the generalization ability of word embedding vectors and the potential semantics between characters of the model, depending on weight and other feature extraction capabilities. The experiment is carried out in the corpus of ancient poetry, and the model is compared with Bert-BiLSTM-CRF, BiLSTM-CRF, and CRF model. The results show that the entity extraction effect of ancient poetry is significantly improved, and the harmonic average value is 97.17%. Compared with Bert model as the pretraining model, Albert model reduces the time by 19.56%.

1. Introduction

Poem as an important part of ancient Chinese traditional culture, has a long history. To study and carry forward the traditional culture in ancient China, ancient Chinese poem is of great significance to the construction of beautiful China. The main emotion in ancient poetry is often related to the imagination in ancient poetry. Poets often use these imaginative artistic methods to express their emotions and inner feelings. These imagination are often the entities of poetry, such as “兰” (orchid), “芭蕉” (banana), and “梧桐” (sycamore). The author crawled 1983 poems from the ancient poetry website, including the patriotic poems, spring, summer, autumn, winter, and other categories of poems. Extracting entities from it, analyze the most important entities in various poems, and displaying them in the form of knowledge graph [1] is of great significance for the study of ancient poetry. In the past, the research on ancient poems was mainly the artificial research on ancient books without the assistance of computer, while the knowledge graph and other technologies provided new ideas for the study of ancient poems [24].

At present, in the field of Chinese named entity recognition (CNER), there are still a lot of studies on ancient classics, such as the research on some pre-Qin classics. But most of them are still named entity recognition in the field of modern Chinese, such as medicine [5] and military [6] field. The difference between the field of ancient poetry and other fields of named entity recognition mainly shows the difference between ancient Chinese and modern Chinese. As an ancient Chinese, ancient poetry has both similarities and differences compared with modern Chinese. The similarities are manifested in the same sentence components, which all have six components: adverbial, attribute, subject, predicate, complement, and object, and their relative positions are basically the same. The differences are embodied in grammar and sentence patterns. The language of ancient poetry is often short, subject-verb-object structure is bound, and words change greatly. For example, in the poem “为赋新词强说愁,” it is expressed in modern Chinese as “to describe the sad artistic conception in order to write a new poem.” It can be seen that there are many modifiers in modern Chinese, which makes the model easy to extract entities. In this paper, on the premise of establishing its own corpus, the pretraining model in NLP is used to construct the knowledge graph based on ancient poems, which deepens the computer-aided research on ancient poems.

The main results of this paper are as follows: (1) to construct the entity recognition model of a variety of ancient poems based on the lite version of BERT (Albert) pretraining model. (2) Albert, bidirectional long short-term memory, multihead self-attention model, and conditional random field (Albert-BiLSTM-MHA-CRF), the best model for entity extraction of ancient poems, was selected to improve the effect of entity recognition of ancient poems. (3) A knowledge graph of ancient poetry was created according to the dynasty, author, title information, entity, etc.

Named entity recognition, as a basic task of natural language processing (NLP), was proposed on MUC-6 [7] and is a basic technology of question answering system [8, 9], information extraction [10], and knowledge graph. The task is also related to relation 57 extraction [1113] and event extraction [1416]. Its purpose is to extract the names of people, place names, time, and other entities in specific fields. Entities in ancient poetry, such as imagination and time entities, are of great significance for the study of traditional ancient poetry and are also conducive to constructing the knowledge graph related to ancient poetry. Because ancient poetry is different from modern Chinese and is a style of ancient Chinese, which is similar to classical Chinese, at present, there is less research on ancient poetry, and most of them focus on the recognition of named entities in modern Chinese related fields.

Named entity recognition generally has three main research methods: firstly, rule-based method; secondly, statistical machine learning-based methods; and thirdly, deep learning-based methods. In the rule-based named entity extraction, it is time-consuming to build rules, knowledge base and dictionary manually, and the effect is poor. Hidden Markov model (HMM) and CRF model are widely used among the statistical machine learning models. For example, Y. Zhang [17] et al. used CRF to extract place names in Tang poems, and the harmonic average value reached 82.33%. However, statistical machine learning needs to design feature templates to extract features artificially, and its effect depends on artificially designed features to achieve good results. Compared with statistical machine learning, deep learning can automatically extract features from the data. In addition, deep learning has achieved the best state-of-the-art (SOTA) record in a series of downstream tasks of NLP. Some classical deep learning models, such as recurrent neural network (RNN) and long short-term memory (LSTM), have been used for named entity recognition [1820]. For example, Limsopatham and Collier [18] used the BiLSTM model for named entity recognition in Twitter messages. After 2018, as bidirectional encoder representations from transformers (Bert), Taher et al. proposed a series of pretraining models and introduced them into various tasks in NLP, and named entity recognition based on the pretraining model has been widely applied [2123]. For example, Zhang [24] et al. proposed the Bert-BiLSTM-CRF model. Based on BiLSTM-CRF, the Bert pretraining model was introduced, which was superior to other models in extraction effect of Chinese medicine. Moreover, Lv [25] et al. used Albert to improve recognition effect in Chinese named entity recognition for less training time and better effect. Therefore, this paper proposed a kind of ancient poetry entity recognition based on pretraining model and further improved the effect of ancient poetry entity extraction by using the latest NLP pretraining model Albert.

3. Methodology

1In order to explore the effect of Albert model on named entity recognition in classical poetry texts, the research ideas of this paper mainly include the following: firstly, corpus collection: collect corpus of ancient poetry through multisource channels such as websites and books; secondly, corpus construction: preprocess the collected classical poetry corpus, analyze and label named entities, and construct experimental corpus; and thirdly, named entity recognition experiment: CRF [26] model, BiLSTM-CRF [27] model, Bert-BiLSTM-CRF [28] model, and Albert-BiLSTM-MHA-CRF model were used to carry out named entity recognition experiment of ancient poetry texts, to test and compare the performance of various models in accuracy and recall rate and value. The most 100 suitable named entity recognition model for poem text was obtained by analyzing the 101 experimental results, and the model was applied in the test set for verification.

3.1. Construction of Ancient Poetry Dataset

At present, there are few studies on ancient poetry and there is no relevant dataset. Therefore, this paper crawls 1983 ancient poems, and these poems from relevant ancient poetry websites, then, build a dataset related to ancient poems. The specific construction steps are as follows:

Firstly, 80% of the crawling ancient poetry are used as the training set, 10% of the training set is used as the validation set, and 20% of the other corpus is used as the testing set and YEDDA [29] is used for labeling, as shown in Figure 1. Label the four types of entities involved in the ancient poetry dataset. The labeling method uses “[@” and “]” to represent the left and right boundaries of the entity, and “Time,” “Scene,” “Person,” “Location” represent the entity classes, for example, “[@雨#scene]” and “[@洞庭#location]” in Figure 1, where “雨” is the scene entity and “洞庭” is the location entity. In the process of labeling, if an ambiguous entity is encountered, it will be documented, and the final labeling conclusion will be determined through multiperson discussion. According to the above rules, the annotated dataset is processed into sequence annotation form in python. Each character and corresponding label are on one line, and there is a blank line at the end of each sentence. Finally, the ancient poetry dataset constructed in this paper is obtained.

Second is the establishment of entity category, because the poet has always studied poetry in terms of time, place name, person name, and imagination. Therefore, four basic entity types are identified in this study, as shown in Table 1.

Thirdly is the labeling system: the labeling system used in the experiment is the BIO labeling system, where “B” represents the initial position of the entity and “I” represent other positions of the entity except the initial position of the entity. “O” means not an entity location. Annotation examples are shown in Table 2.

3.2. Albert-BiLSTM-MHA-CRF Model

This model is constructed by Albert, BiLSTM, and CRF models. First, the sum of the word embedding, position embedding, and segment embedding of input characters is taken as the input vector of Albert. The vector output from Albert is input into BiLSTM model to encode and learn the features of the text. Then, the mined features, the hidden state () at time , were used as the output to decode and predict the rational relationship between tags and output the optimal tag sequence. The model structure was shown in Figure 2.

The first layer of the model is the Albert layer. When input to the Albert layer of the model, according to the Vocab file in Albert, the input characters are vectorized to represent the poems, and the poems are converted into data that can be processed by the computer. Then, the poetry represented by the vector should be output by Albert training and recorded as a sequence . Compared with word vectors such as Word2vec and global vectors for word representation (Glove), the character vectors generated by Albert can effectively generate different character vectors according to the context, effectively solving the problem of polysemy.

The second layer of the model is the BiLSTM layer. The word vector generated by Albert is used as input to the BiLSTM layer to obtain the forward hidden layer state and the reverse hidden layer state . The resulting hidden layer state is and . It is stitched together according to its position and denoted as .

The third layer of the model is the multihead self-attention mechanism. The sequence generated by the BiLSTM layer is input to the MHA model. Through three different mapping operations, transform to matrix queries , key value , and value with both dimensions of , respectively. Then, do times of parallel self-attention between sequence to get , and continue to integrate all semantic information of the head and define it as MultiHead. Secondly, MultiHead is mapped to the dimension ( is the number of entity classes), and the sequence after mapping is ; are the scores of , corresponding to each entity type . The fourth layer of the model is the CRF layer, which can consider the order of output labels according to the transfer characteristics, so it is the last layer.

3.3. CRF Model

Although the long- and short-term memory neural network and the multihead self-attention mechanism can learn contextual labels, to output the label with the highest probability, they did not take into account the dependencies between labels, which may cause the same label together. This is unlikely to happen in reality; however, the conditional random field model can consider the order of the tags. Therefore, the CRF layer is selected as the final output layer. The commonly used first-order chain structure CRF is shown in Figure 3.

For character sequence , the prediction label sequence can be obtained by using the linear link conditional random field. Its predicted score is

is the position, the probability that the output is , and is the probability of transferring from to . The best predicted tag sequence can be obtained by using viterbi algorithm:

Viterbi algorithm can obtain the maximum state path by dynamic programming algorithm.

3.4. MHA Model

Although the BiLSTM model can obtain the current context information, as the length of the sentence continues to increase, BiLSTM model will also lose some important information. However, the MHA model can fully capture the characteristics of long distance and obtain global information. Acquisition of various features from character, word, and sentence level can improve the effect of entity recognition. The matrix output from BiLSTM gets , , and () of self-attention through cubic matrix mapping. By doing attention on and , the calculation of attention is as shown in

By doing self-attention operations in parallel, is the dimension of , and . Every time the attention function operates, is obtained; finally, the are spliced to obtain , and the specific calculation formula is shown in

Thereunto, is the matrix of linear transformation of , and , respectively, and is also the parameter matrix to be used.

3.5. LSTM Model

Long short-term memory neural network is the most popular recurrent neural network. Compared with the general recurrent neural network, there are three more gate states: input gate, output gate, and forget gate. Input gate and output gate control the input and output of the unit, and the forget gate controls whether to save the previous unit state to the current unit status. The calculation formula is shown in

The input gate determines whether the current input is saved to the state of the unit. The calculation formula is shown in

Output gate and unit state determine the output of LSTM. The calculation formula is shown in

LSTM automatically extracts the features from the character vectors output in Bert and then uses the tags predicted in the context at the CRF layer to get the optimal sequence.

3.6. Albert Pretraining Model

Albert model is derived from the encoder of transformers and is considered as a lightweight Bert with few parameters and has been optimized in two aspects:

First is the pretraining tasks: the two pretraining tasks of Bert are MLM (masked languages model) and NSP (next sentence predication), both of which have certain defects. MLM uses the Cbow and Skip Gram methods in Word2vec to mask the token in sentence. The MLM task selects 15% of the tokens, replaces those words with masks, and then predicts those tokens. However, in order to prevent overfitting, the general model will choose to dropout. Once these masks are ignored, the information will be lost. But MLM is already hard to fit, so Albert does not do dropout. Deleting dropout can also reduce the number of arguments and save memory. And the experiment verified that after deleting dropout, the effect of the downstream tasks was enhanced, and the best result of SOTA was achieved. Albert also improved MLM by predicting -gram fragments, which contain more semantic information, rather than random 15% tokens. However, NSP task is too simple. Positive samples are two sentences adjacent to each other, while negative samples are randomly selected from the training set, resulting in less semantic information of the trained vector. Therefore, an improvement is made on Albert, replacing NSP task with SOP (sentence order prediction). SOP task can capture more context semantic information than NSP task. The positive sample of NSP task is two sentences in normal order, and the negative sample is two sentences in transposition order. In a single task of the SOP, there is a mix of topic prediction and coherence prediction. Topic prediction can be learned in NSP task, but coherence prediction cannot be learned. SOP task is needed to learn coherence between sentences.

Albert reduced the number of parameters from the following two aspects: first is the parameterized factorization of the embedding vector. In Bert, the word embedding size and the hidden layer size are equal, and the dictionary is large, so that is also large, where is much less than . Second is the parameter sharing across layers. Albert shares all parameters of all layers, instead of just sharing parameters of the full connected layer and the attention layer, which can greatly reduce the number of required parameters. As can be seen from the above improvements, Albert is not only a lightweight Bert but also has been optimized.

Secondly, the general named entity recognition models cannot make use of the relationship between sentences effectively. Albert has an advantage in this respect. Compared with Word2vec, the character vector generated by Albert is dynamic, while the word vector generated by Word2vec is fixed, which cannot effectively solve the problem of polysemy. One of the advantages of Albert is that it can be used for transfer learning. The features extracted from the Albert pretraining model can be directly applied to the new task, or it can be fine-tuned and then applied to the new task. Albert is shown in Figure 4.

and are the input word vectors, Trm is the transformer encoder, and and are output by the encoder as and .

4. Experiments and Analysis

4.1. Collection and Processing of the Original Corpses

At present, there are no tagged poems. The author crawled 1983 poems from the ancient Chinese poetry website, obtained the titles, contents and authors of the poems with crawlers, and stored them in the files.

The author found duplicates in the ancient poems, so it was necessary to search and delete the duplicated poems; the distribution of number of entity categories is shown in Table 3.

Entity statistics is as follows: in the ancient poetry dataset, there are 1983 poems, among which 1925 poems contain entity and 58 poems have no entity. In the content of poems, there are a total of 11,428 entities of 4 types, of which there are 143 types of time entities, totaling 2041, imagination entities have 295 types, totaling 8,045, place name entities have 269 types, totaling 972, and person name entities totaling 190 types, totaling 370.

The four types of high-frequency entities are shown in Table 4. Analysis shows that poets like to travel in spring in some scenic spots, such as “江南” (Jiangnan), “洞庭湖” (Dongting Lake), “长安” (Chang’an City), and “西湖” (West Lake), to write poems about the wind, flowers, snow, and moon, which is also in line with the actual situation.

4.2. Universal Experimental Dataset

In order to further verify the generality of the model in this paper, the author conducts experiments on public datasets in the field of news, social media, and finance. (1) MSRA: datasets in the field of news, including three entities: place name (LOC), organization name (ORG), and person name (PER). (2) Weibo: the field of social media, including place name (LOC), organization name (ORG), administrative region name (GPE), and person name (PER). (3) Resume: financial domain, including place name (LOC), organization name (ORG), and person name (PER) and other entities.

The annotation methods of the three datasets are all BIO annotation. The detailed information of the datasets is shown in Table 5.

4.3. Experimental Environment and Parameter Setting

Firstly, the model was trained and tested under the framework of Python3.6 and Tensorflow1.14. The experimental hardware was 1080Ti, and the video memory was 11G. The Albert-based model was used in the experiment, with 64 multihead attention mechanism and 768 layers of hidden layers. The state of LSTM network hidden layer is set to 200 dimensions from front to back. The maximum sequence length was set to 64, and Adam was selected as the optimization function to reduce the loss each time. The learning rate of the model is set to 0.001, and the dropout is set to 0.5 to prevent overfitting. The batch size of the training set, validation set, and the test set was selected as 64, and the maximum number of iterations of the model was 500. The best model was saved each time.

Second is the experimental comparison based on CRF model. As a traditional statistical machine learning method, conditional random field is more classical, so the CRF model is used as baseline model in this paper.

4.4. Evaluation Indexes and Experimental Results

First is the evaluation indexes: the evaluation indexes in this paper adopted the classical accuracy , recall rate, and the harmonic mean of the two, namely, value.

Second is the experimental results: CRF, BiLSTM-CRF, and Bert-BiLSTM-CRF models and their comparison with Albert-BiLSTM-MHA-CRF models were conducted in this experiment. The accuracy, recall rate, and value of named entity recognition of each model were shown in Table 6.

It can be seen from Table 6 that CRF can identify a considerable number of entities. Compared with the CRF model, the value of BiLSTM-CRF increases by 0.63%, indicating that the ability of extracting features and the effect of entity recognition are improved after the addition of BiLSTM. Followed by the Bert-BiLSTM-CRF model, compared with BiLSTM-CRF and CRF models, accuracy, recall rate, and value of Bert-BiLSTM-CRF have been greatly improved. Compared with BiLSTM-CRF model and CRF model, value was increased by 4.53% and 5.16%, respectively. It shows that Bert uses bidirection transformer to extract features based on contextual semantic depth and can learn character-level, word-level, and sentence-level features through the two tasks of MLM and NSP during pretraining. Overall, the optimal model is the Albert-BiLSTM-MHA-CRF model.

The value of the Albert-BiLSTM-MHA-CRF model was 1.27% higher than that of the Bert-BiLSTM-CRF model. Under the same hyperparameter setting, when the Epoch number was set to 500, the running time of Albert was 6 hours 43 minutes, while the running time of Bert was 8 hours 21 minutes. Albert’s training time is 98 minutes less than Bert’s. This is because the number of parameters is much smaller than that of Bert due to cross-layer parameter sharing and factorization of embedded vector parameterization, leading to Albert’s faster speed than that of Bert’s training. The results on the value can be explained from both Albert and MHA models. On the one hand, Bert has more parameters than Albert, and to a certain extent, it can train a better model than Albert. However, Albert has also improved based on Bert, and the amount of data is relatively small. The improvement of Albert over Bert lies in the improvement of the two tasks during pretraining. During pretraining, the mask on the MLM task is the -gram segment, and the -gram segment contains more semantic information, and the simpler NSP task is replaced by the SOP task, resulting in a better effect of Albert than Bert. On the other hand, although the BiLSTM model can obtain the current context information, as the length of the sentence increases, BiLSTM will also lose some more important information. The MHA model can fully capture long-distance features and obtain global information. Obtaining multiple features from the character, word, and sentence level improves the effect of entity relationship extraction, assigns more weight to important content in the text, and reduces the attention to nonimportant features, so that it is easier to capture long-distance important features. In summary, the Albert-BiLSTM-MHA-CRF model performs better on the ancient poetry dataset.

In addition, the values of entity recognition of the method proposed in this paper and other methods based on deep learning on public datasets in three different domains are shown in Table 7. These models are lattice LSTM [30], a lstm model that fully considers word and character information; CAN_NER [31], a character-based local attention layer convolutional neural network (CNN) and gated recurrent unit (GRU) with global self-attention layer; CWPC_BiAtt [32], a attention-based bilstm model combining character and word position information; and ACNN [33], a model combining multilevel CNN and attention mechanism. As can be seen from Table 7, Albert-BiLSTM-MHA-CRF has improved the effect of entity recognition on Weibo and Resume datasets. Compared with Bert-BiLSTM-CRF, the entity recognition effect of Albert-BiLSTM-MHA-CRF is increased by 5.51% and 0.41% on value, respectively. The improvement effect of value on Weibo dataset is the most obvious. The results are poor on MSRA dataset. This is because the MSRA data set is large, and the Bert-BiLSTM-CRF model has been well modeled for MSRA dataset, so there is little need for multihead self-attention mechanism. In terms of Weibo and Resume datasets, the model proposed in this paper can make full use of extracted words and sentence features to improve the accuracy of entity recognition. Compared with other models, the model proposed in this paper has achieved a better recognition effect on datasets in multiple fields, is more stable on different data sets, and has certain robustness. Therefore, the model proposed in this paper is proved to be effective.

This paper also conducts experimental analysis on the performance of Bert-BiLSTM-CRF and Albert-BiLSTM-MHA-CRF on different datasets and different data volumes. According to Tables 6 and 7, it can be seen that Albert-BiLSTM-MHA-CRF performs well on the Weibo and Resume datasets, but slightly worse on the MSRA model. The sentences of Weibo, Resume, and MSRA datasets are 1.94 k, 4.74 k, and 50.8 k, respectively, and Weibo has 4 types of entities, Resume has 8 types of entities, and MSRA has 3 types of entities. It can be concluded that the Weibo and Resume datasets are small and have many entity categories, while the MSRA dataset is large and has relatively few entity categories. Therefore, this paper focuses on the experimental analysis of the training set of the MSRA data set. The experimental results are shown in Table 8. With the increase of the amount of MSRA data, it can be seen that the entity extraction effect of Bert as the pretraining model is better than that of Albert, getting better and better. Combining Tables 7 and 8, the basic conclusion can be drawn: in these three datasets, when the number of entity types is greater than or equal to 3, and the number of sentences in the dataset is less than 5.5 k, the Albert-BiLSTM-MHA-CRF model is better than the Bert-BiLSTM-CRF model.

At the same time, the experimental results of each entity type in the optimal model Albert-BiLSTM-MHA-CRF are analyzed, as shown in Table 9. The value of the location entity is 11.42% higher than that of Y. Zhang [17] et al. The best effect of entity recognition is the time entity, whose value reaches 97.82%, while person name entity has the worst effect, whose value is 63.16%. Its recall rate is significantly lower than the accuracy rate. The low recall rate indicates that the original corpus’s person name entity is predicted as nonperson name entity in more cases, while the accuracy rate is high, which means few cases in the corpus where nonperson name entities are predicted to be person name entities. Since there are 190 categories of person name entity in the corpus and there are 370 person name entities in total, this results in a small number of entities per entity type, while the other three types of entities have few entity types, which leads to more features identified by the other three types of entities. This increases the number of situations in which person name entities are predicted to be nonperson name entities during training, resulting in a lower recall rate for person name entities during training.

4.5. Knowledge Graph Display for Named Entity Recognition of Ancient Poems

The knowledge graph as shown in Figure 5 is constructed by establishing connections among the four identified entities and the name of the poem, the author of the poem and the dynasty of the poem. The nodes in the knowledge graph of ancient poetry include poetry names, four types of entities, poets, and the dynasty of poets. The three kinds of edges include “has_poetry,” “entity_is,” and “dynasty_is,” and edges are the connection between entities. For example, “纳兰性德” (Nalanxingde) established a connection with the entity “花” (flower) through his poem “如梦令” (Like A Dream), and other poets also have the entity “花” (flower) in their poems. It can be seen that the connection between poems is established through various entities, and the imagination, time, and place name that poets like can also be found through the statistics of entities in their poems. The more frequently the characters and entities appear in the poem, the closer the relationship with the poet is.

4.6. Reasons for Identification Errors of the Four Types of Named Entities

For the best Albert-BiLSTM-MHA-CRF model, the value is 97.17%, but there are still misidentified entities. After analysis, there are four reasons as follows:

Firstly, the design of a single entity is too complicated. For example, the time entity can not only represent a specific time in a certain year on a certain day, but also the name of a dynasty such as “越明年” (next year), which can represent a specific period of time in history, as well as the “甲子” (Jiazi) time of the lunar calendar, such as “庚戌” (Gengxu) and other lunar timing, which increases the difficulty of identifying the time entity.

Secondly, the entity of a single word is difficult to recognize. Compared with modern Chinese, ancient poetry has more entities of a single word. For example, the “蕙” (Hui) and “芷” (angelica) entities, in the imagination entity, are plants in ancient Chinese, far from the general imagination entity, so this entity of a single word is difficult to identify.

Thirdly, there will also be crossover between the four types of entities, such as “秋 风” (autumn wind) and “秋” (autumn). “秋风” (autumn wind) is an imagination entity, while “秋” (autumn) is a time entity representing time, which is easy to mistakenly identify the entity to be identified as another entity.

Fourthly, the data is unbalanced. For example, the number of time entities “春” (spring), “夏” (summer), “秋” (autumn), and “冬” (winter) and a certain day in a certain year are relatively large. However, such as “七夕” (Qixi Festival) and “寒食” (Cold Food Festival), the number of time entities representing festivals is usually relatively small, which brings difficulties to identifying time entities representing festivals.

Fifthly, the model falls into saddle point. As the network structure becomes more complex and the network training parameters increase, the model may fall into local optimization, that is, saddle point, resulting in errors in the recognition of four types of entities.

4.7. Improvement of Named Entity Recognition Errors

In view of the above four types of entity recognition errors, the author believes that improvements can be made from the following aspects:

Firstly, build a dictionary related to ancient poetry. The new words in ancient poetry can be found by using the new word discovery algorithm based on mutual information and left-right entropy. For the problem of confusion between unrecognizable words and entity names in ancient poetry, characters and words can be added to the dictionary, such as “秋风” (autumn wind) and “秋” (autumn), which is conducive to improving the effect of entity recognition.

Secondly, continue to subdivide complex entity types. For example, the time entity can be further subdivided into Jiazi time, festivals, etc. The model can more easily extract the same type of features.

Thirdly, expand the scale of corpus. Expanding the corpus can increase the frequency of each entity type, make the model training and extract relevant features more fully, and improve the training effect.

Fourthly, using the gradient activation function (GAF) proposed by Liu [34] et al., the function can make the model alleviate the saddle point problem and obtain a global optimal solution to improve the effect of entity recognition.

4.8. Application in the Field of Password Guessing

Passwords in text form are still commonly used authentication mechanisms in various computer systems [35], and passwords are essentially short texts with rich semantics, which contain the user’s personal information, such as name, birthday, and mobile phone number. The previous method is to use PCFG (probabilistic context free grammar) to obtain all possible password combinations and corresponding probabilities of a user. For example, D. Wang [36] et al. proposed TarGuess, on the basis of PCFG; the user’s personal information and the password leaked by the user on other similar websites are added. With the continuous development of deep learning, password guessing based on neural networks has gradually emerged. Melicher [37] et al. found that neural networks are better at guessing passwords than PCFG at higher guessing times and for more complex or longer passwords. However, PCFG still has advantages. For example, Veras [38] et al. applied LSTM to password guessing and found that compared with the LSTM model, PCFG is still a competitive model, which is more important for the security of passwords, and PCFG is more explanatory.

The model proposed in this paper can be applied in the field of password guessing, as text, passwords are mainly composed of numbers, letters, and symbols, and both passwords and ancient poems are short texts. Albert-BiLSTM-MHA-CRF can more accurately identify entities such as person entity, birthday date entity, mobile phone number, and other entities according to the context in the constructed leaked password dataset, not just based on the user’s personal information in the dataset. And it can build a knowledge graph in the field of passwords and assist PCFG to guess passwords with fewer times through the entity relationship between passwords.

5. Conclusion

With the continuous development of entity name recognition technology and deep learning-related models, the recognition accuracy of modern Chinese has been greatly improved. However, few researches on the entity recognition are related to ancient Chinese, such as the entity extraction related to ancient poetry. In this paper, the entity extraction of ancient Chinese is carried out on the basis of Albert pretraining model. In the following research, we will expand the scale of the corpus of ancient poems and words and explore more brand-new models to further improve the effect of entity recognition of ancient poems.

Data Availability

All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Authors’ Contributions

C.W., F.Z., and J.W. conceptualized the study; methodology was carried out by C.W. and F.Z.; C.W. was responsible for the software; C.W. and F.Z. were responsible for the formal analysis; investigation was carried out by C.W. and J.W.; C.W. was responsible for the data curation; C.W. wrote the original draft; C.W. and J.W. wrote, reviewed, and edited the manuscript; C.W. visualized the study; F.Z. supervised the study; F.Z. was responsible for the project administration; F.Z. was responsible for the funding acquisition. All authors have read and agreed to the published version of the manuscript.