Abstract
In recent years, entity relation extraction has been a critical technique to help people analyze complex structured text data. However, there is no advanced research in food health and safety to help people analyze the complex concepts between food and human health and their relationships. This paper proposes an entity relation extraction method FHER for the few-shot learning in the food health and safety domain. For few-shot learning in the food health and safety domain, we propose three methods that effectively improve the performance of entity relationship extraction. The three methods are applied to the self-built data sets FH and MHD. The experimental results show that the method can effectively extract domain-related entities and their relations in a small sample size environment.
1. Introduction
Food is inextricably linked to human health, particularly the nutritional components that can significantly improve health. For instance, increased spermidine intake protects against cancer, metabolic disease, heart disease, and neurodegeneration [1]. A higher intake of whole grains and dietary fiber is associated with a decreased risk of death from liver cancer and disease [2]. In addition, contaminants in food or an excessive amount of artificial additives have a detrimental effect on human health and even cause various diseases. For instance, interference with the intestinal microbiota’s metabolites caused by food contaminants (polycyclic aromatic hydrocarbons, polychlorobiphenyls, brominated flame retardants, dioxins, pesticides, and heterocyclic amines) may promote the establishment of an inflammatory state in the intestine [3]. Food emulsifiers and thickeners influence the intestinal microbiota, mucosal barriers, and inflammatory pathways, and there are numerous possible pathogenic mechanisms [4].
With the proliferation of available human health data, humans use predictive or classification models to extract useful information, which helps people enjoy better health protection. Typical of this available human health data is the electronic health record (HER), the largest source of medical text data. One of the critical points in analyzing these unstructured textual data is to extract vital medical concepts, so many named entity recognition methods and entity relation extraction methods have emerged. Wan et al. proposed an ElMo-ET-CRF model approach for extracting medical named entities from Chinese electronic medical records (CEMR) using dynamic context-dependent ElMo character embedding to merge more lexical, syntactic, and semantic information that alleviate the long context-dependency problem [5]. Luo et al. proposed a novel tagging scheme considering overlapping relations to solve the overlapping problem in biomedical texts and then built the Att-BiLSTM-CRF model to extract the entities and their relations according to the rules [6]. Fei et al. proposed a new cross-graph neural model for joint extraction of overlapping entity relations in biomedical texts. They treat the entity relation extraction task as a relational triad prediction and construct entity graphs by enumerating possible candidate entity spans [7].
It is possible to help explicitly store the relationships between complex medical concepts for use in subsequent tasks by extracting them. Similarly, for the food health and safety domain, food regulators sample food products on the market and examine the ingredients and form structured data. However, in addition to structured data, more unstructured textual data goes beyond the act of food sampling and inspection to include announcements of food safety events, news reports, and knowledge about food and human health. Cenikj et al. proposed a method to detect the relationship between food and disease entities from text. They explored the feasibility of migration learning using a pretrained model based on BERT and achieved good results with few-shot learning [8]. Popovski et al. proposed a rule engine for extracting food concepts called FoodIE, a rule-based named entity recognition method with rule content describing food entities’ computational linguistic and semantic information [9].
However, the food health and safety field does not have many available resources like the biomedical field, so few-shot learning becomes an effective method to improve information extraction. Qu et al. proposed a Bayesian meta-learning method for relation extraction in few-shot learning environment, applying graphical neural networks to global relation graphs to improve accuracy [10]. Sainz et al. reformulated relation extraction as an entailment task with good results for zero- or small-sample relation extraction tasks [11].
The essential technique for extracting valid information in the food health and safety domain is entity relation extraction. This task requires the model to identify entities and relations in a sentence correctly and combine them correctly. The complexity of the task and the small sample size together contribute to the difficulty of the entity relation extraction task with few-shot learning. For supervised deep learning models, small sample size can lead to an underfitting of the model. In other words, there is not enough information to train a valid model. In addition, the model tends to converge at a local optimum, leading to poorer practical results of the model.
To solve the underfitting problem, we propose two methods to enrich the semantic features of the input text. The first is to disassemble Chinese characters into radicals (https://en.wikipedia.org/wiki/Radical_(Chinese_characters)) and construct semantic input at the radical level. As pictographs, radicals are part of the structure of Chinese characters. The radicals have specific meanings; in other words, they represent part of the semantic information of the constituent characters. Therefore, constructing semantic input at the level of radicals can enrich the semantic information of the input text very well [12, 13]. The second approach to enrich semantic information is to fuse input text characters and word vectors. As units with independent meanings, characters are often given new meanings after forming words, and word information can significantly complement the interpretation of words within words [14]. For instance, the Chinese character “瘦 (lean),” “肉 (meat),” and “精 (essence)” form the word “瘦肉精,” which stands for a drug that is banned from being added to animal feed. In their study, Chen and Hu also verified that Chinese words contain rich semantic information [15]. Tran et al. combined the advantages of character- and word-level translation and proposed a new method for translating Chinese into Vietnamese [16].
To solve the problem of the model falling into local optimality, we propose a text noise removal model to help the model converge quickly to the global optimum. As shown in Figure 1, for a given input text: “Enrofloxacin belongs to the third generation of quinolones, is a class of synthetic broad-spectrum antibacterial drugs, according to the “National Food Safety Standards Maximum Residue Limits for Veterinary Drugs in Food,” the maximum residue limit of enrofloxacin in the skin and meat of fish is 100 μg/kg.” There are four groups of entity relations combinations in the sentence.

We need to extract the two entity relation groups below the sentence in Figure 1 (index values 3 and 4 in Table 1). The two entity relation groups above the sentence (index values 1 and 2 in Table 1) are weakly related to food health and safety. In this paper, we believe that we need to accurately propose the entity relation groups that are food-health-related. The entity relation groups that are weakly related to food health and safety will become noise to affect the extraction effect.
To effectively extract entities and relations for texts in the food health and safety domain in a small-sample learning environment, this paper proposes a method for entity relation extraction in the food health and safety domain, FHER (food health entity relation extraction).
Our contribution can be summarized as follows:(1)To reduce the influence of noise in the text, we propose a text denoising method for domain entity relation extraction to convert the text denoising task into a sequence prediction task, which effectively reduces the correct range of entity relation extraction.(2)To address the lack of semantic information due to the small sample size under the few-shot learning, we propose constructing the input at the part-head level to enrich the semantic information.(3)We propose the fused character and word method to improve the ambiguity problem caused by errors in Chinese word separation boundaries.
2. Related Works
There are two approaches in current research [17]. Traditionally, a pipeline approach is used to extract entity mentions using a named entity recognizer and then predict the relationship between each pair of extracted entity mentions [18, 19]. This approach inevitably brings the problem of error propagation. To alleviate the error propagation problem, Yu and Lam proposed a joint extraction approach that performs two subtasks simultaneously [20]. The joint extraction approach is gradually becoming mainstream in entity relation extraction tasks. Geng et al. proposed an end-to-end joint entity and relation extraction method based on a combined attention mechanism of convolutional and recurrent neural networks [21]. Wan et al. proposed a region-based hypergraph network (RHGN) for joint entity and relation extraction, introducing the concept of regional hyper nodes to enhance contextual connections [22]. Wei et al. proposed a cascading pointer labelling approach, which solves the problem of overlapping entity relations [23]. Qiao et al. proposed a joint entity relation extraction model BERT-BiLSTM-LSTM for agriculture and verified that the BERT model has better migration in agriculture [24].
Due to the scarcity of textual data in the food health and safety domain, few-shot learning can effectively improve the effectiveness of entity relation extraction. Most of the current research on small-sample learning has focused on relationship extraction tasks. For relation extraction in a few-shot learning environment, Qu et al. propose a Bayesian meta-learning method that applies graphical neural networks to global relation graphs to improve accuracy [10]. Sainz et al. reformulate relation extraction as an entailment task with good results for zero- or small-sample relation extraction tasks [11].
For the entity relation extraction task in food health and safety, we propose a joint entity relation extraction method FHER with noise removal and feature enhancement, which mainly addresses how to perform effective entity relation extraction in a few-shot learning environment with high noise and low data volume. Our model adopts a cascading pointer annotation approach [23] to mitigate the entity overlap problem in the entity relation extraction task.
3. Methodology
3.1. Overview
The FHER method divides the whole entity relation extraction task into five parts: the first is a text noise removal model, the second is the construction of a radical level input, the third is the fusion of character and word features, the fourth is the prediction of possible subjects in a sentence, and the last is the prediction of relations and corresponding objects.
Figure 2 shows the overall structure of the model. The input sentence passes through the BERT Encoder layer to obtain vector . The prediction sequence is obtained after passing through the text noise removal model. The position vector can be obtained through the binary function. Then, the input of sentence at the radical level is constructed, and the vector is obtained after passing through the BERT encoder layer. The vectors and and the position vector are fed into the fusion function to obtain the vector after fusing the characters and words. Using the position vector and the fusion vector , we can obtain the possible object positions corresponding to each relation and thus the combination of entity relations present in the sentence.

We define the entity relation extraction task in the food health and safety domain: unstructured input text, output extracted human health-related concepts and their relationships, and further form food- and health-related knowledge triad. Formally, for a given sentence , all possible triples are extracted, with representing the subject in sentence , representing the object in sentence , and representing the relationship between the subject and the object. For sentence , the probability of all possible triples that it contains is as follows:
where, according to the chain rule, denotes the subject appearing in the triplet , denotes the triplet containing the subject, denotes the combination of objects and relations appearing in the triplet containing the subject, and is the set of relations. For a given training set , the probability that all sentences may contain the triplet is as follows:
The goal of the entity relation extraction task in the food health and safety domain is to find all possible triples in the data set .
3.2. BERT Encoder
The implementation of the language model basis of the model is the encoder layer (BERT encoder). The BERT model consists of a multilayer bidirectional transformer consisting of an encoder that learns the valid information in the context very well [25]. We use a pretrained BERT model using a Chinese corpus from the food domain to encode the context of the input sentence . Formally, for a sentence of length M can be represented as . We input these tokens into the same BERT encoder layer as the trained one to obtain a vector representation of the input sentences as follows:
3.3. Text Noise Removal Model
We summarized the possible relationships between concepts related to food health and safety into 12 kinds by reading many food health and safety information. Table 2 lists the details of the relations since the text contains combinations of entity relations with low relevance to the domain; these irrelevant combinations of entity relations act as noise. It will interfere with identifying entity relation combinations in the food health and safety domain. To reduce the influence of noise in the text, we propose a text denoising method for domain entity relation extraction to convert the text denoising task into a sequence prediction task, which effectively reduces the correct range of entity relation extraction. Figure 3 shows the structure of the text noise removal model structure.

We transformed the text noise removal task into a sequence prediction model. To capture the sentence context information, we used the BiLSTM layer for further feature extraction. Then, considering the dependencies between the labels, we use the undirected probabilistic statistical graphical model CRF layer to handle the exact sequence labelling problem. Each character is assigned a BIO tag (B represents the beginning of an entity, I represents In an entity, and O represents Out of an entity). At the same time, we modified this tagging method to predict the possible types of entities (see Figure 4).

With the BERT encoder, we obtain the vector representation of the sentence, which is then fed to the BiLSTM layer to obtain the intermediate vector representation as follows:
For the input sequence and the corresponding predicted label sequence , the model is trained to obtain the prediction sequence result as follows:
3.4. Construction of the Radical Feature
As the smallest semantic unit, radicals themselves have particular semantic meanings. Chinese characters are usually composed of smaller primary radicals, which are the most basic units that constitute the meaning of Chinese characters. In essence, this radical semantic information helps make characters with similar radical sequences (writing order) close to each other in the vector space, so it can be used to enrich the semantic information of the word vector and enhance the model effect. As shown in Figure 5, the sentence “Olive oil contains unsaturated fatty acids” can be represented at the radical level as “木木水口月一食口月月酉.” The radical for the word fat is “月.” When “月” is used as a radical, it can be interpreted as relating to the moon or meat. The radical of the character “酸” is “酉,” and its meaning is related to alcohol and fermentation. The meaning of the decomposed radicals can enhance the semantic information of the word itself.

Formally, for the input sentence , the corresponding radical input is generated based on the radical decomposition, and for each character in , the radical is added to if there is a radical, or directly to if it is a unique word. Finally, we get the input at the radical level as follows:
Similarly, the vector representation corresponding to the input can be obtained through the BERT encoder layer as follows:
The sequence prediction result obtained from the text noise removal model is converted into a position vector by a binary function . will mark the food health-related tag positions as 1 and the irrelevant tag positions as 0.
The input vectors and and the position vector are input to the fusion function . The function multiplies the position vectors with the input vectors and concatenates them. At last, the vector is calculated as follows:
3.5. Character and Word Fusion
As in Section 3.1, we use the BERT encoder layer to get the vector representation of the input sentence . The difference is that the BERT pretraining model used in the previous section is to segment the text into Chinese characters before training, while the BERT pretraining model used here is first to segment the text into Chinese words before training. The length of the input sentence after Chinese word separation is N.
Because of the difference between Chinese and English word construction methods, different characters will be combined into words in Chinese word construction. Then the generated words will generate new meanings. So, in the food health and safety domain, especially in few-shot learning, we need to use as much prior knowledge as possible to help us improve the recognition, so we fuse the character-based vector representation and the word-based vector representation .
We multiply the position vector obtained from the text noise removal model with to obtain the vector . Then we concatenate to according to the corresponding position to obtain the fused vector (see equation (11)), where the function expands the vector to the same length as . For example, the vector for the word “食物” is A, and the vectors for the characters “食” and “物” are B and C, respectively. Then, concatenate vectors A and B, and concatenate vectors A and C.
After combining characters and words, we can predict the location in the sentence where the subject may be. Because the entity relation extraction task itself suffers from the problem of overlapping entity relations, according to the view proposed by Wei et al. [23]: for a given subject s, any relation related to s (relations in T) corresponds to the corresponding object o in the sentence, while all other relations necessarily have no corresponding object in the sentence, that is, the set of corresponding objects is the empty set. So we first predict the possible positions of the subject in the sentence.
For a given input, we predict the location of potential subjects. Two binary classifiers form a subject marker, which marks potential subjects’ starting and end positions. The subject marker is expressed as follows:
where denotes the probability that each position in the input vector may be the start position of the subject and in denotes the probability that each position in the input vector may be the end position of the subject. If this probability is greater than the threshold set in advance, it is for the position is a candidate for the start or end of the subject, marked as 1, and the rest is marked as 0. and are learnable parameters, and and are deviation parameters.
3.6. Relation Extraction
We obtain all possible sets of entity relations in the input sentence by object identification of each subject to all relations. After the subject tagger, we obtain the candidate subject position information, incorporated into the fusion vector as follows:
Similarly, the object tagger predicts the potential object location, consisting of two binary classifiers. The object tagger is expressed as follows:
The process of predicting the location of the object is similar to that of predicting the location of the subject. in equation (15) denotes the probability that each position in the input vector may be the start position of the object, and in equation (16) denotes the probability that each position in the input vector may be the end position of the object. and are learnable parameters, and and are deviation parameters. All relations in the set have an object tagger to mark the start and end positions of candidate objects.
4. Result Analysis and Discussion
4.1. Data Set and Evaluation Metrics
This paper uses an independently constructed food health-related data set FH (Food Health Dataset). Twelve types of food-health-related relationships are defined, and then “OTHER” relationship types are defined to represent less relevant relationships to food health in the sample data. Table 2 lists the 12 relationship types, the Chinese and English names of the included relationships, and their abbreviations. In addition, the paper constructs a small sample data set MHD (Medical Health Dataset) of medical-health-related data to measure the method’s generalization performance. The data set MHD consists of 5 relationship types with 200 sentences. The training set contains 180 sentences, and the test set contains 20 sentences.
The total number of valid annotated sentences in the FH data set is 1,420. This paper divides the FH data set into a training set and a test set. Figure 6 shows the details of the data set.

(a)

(b)
To improve the performance of the model, we introduced more data samples from the public domain into the data set FH to form the incremental data set FH++, which is expanded from 1,420 sentences to 11,327 sentences to get better results in the text noise removal model. For the entity relation extraction task, we remove the “Other” relationships from the FH data set to form the FFH (filtered food health data) data set (http://39.96.33.199:9009/FFH_dataset.zip) used for entity relation extraction experiments.
For the evaluation metrics used in the experiments, we used three commonly used evaluation metrics, precision (P), recall (R), and F1 value, to measure the experimental results from different aspects. Their calculation criteria are as follows: in the formula, indicates that the model correctly predicts the number of relationships present in given sentence , indicates that the model incorrectly predicts the number of relations that do not exist in given sentence , and denotes the number of relations that the model failed to predict for given sentence contained in the sentence.
For the text-to-noise model, we use the metric P to measure whether the model is predicting the location sequence correctly, the metric R to measure the model’s ability to predict the location of entities in a complete way, and the metric F1 to measure model performance in aggregate. We focus more on metric P than on metrics R and F1, as it is more important to find the noise accurately.
For the entity relation extraction task, we use indicator P to measure how well the model extracts the correct combination of entity relationships in a sentence, indicator R to measure the model’s ability to extract coverage in the face of sentences containing multiple relationships, and indicator F1 to measure the model’s performance in aggregate.
4.2. Implementation Details
The pretrained BERT model we implemented in the deep learning framework PyTorch uses the Chinese pretrained BERT model (BERT-wmm) published by Cui et al. [26] with its default hyperparameter settings. The models in this research are implemented on a PC with the configuration of Intel(R) Xeon(R) CPU 3.50 GHz, 64 Gb RAM, and GTX3090 graphics. Table 3 records the other parameters used in the experiment.
The parameters used for the text noise removal model, construction of features at the part-head level, the fusion of character and word features, and entity relation extraction experiments are listed in Table 4.
4.3. Result of Text Noise Removal
The following are the experimental results of text noise removal, and the model is trained iteratively on the data set FH++. Figure 7 records the loss values and the final precision/recall curves during the training process, and it can be observed that the loss values gradually decrease, and the model gradually converges as the number of iterative training increases. The precision/recall curves indicate that the model has a good performance. Table 5 records the precision, recall, and F1 values for the text noise removal experiments on FH and the incremental data set FH++. The model can obtain better convergence and accuracy when training with the incremental data set. The positive and negative values in the experimental results shown in this paper are confidence intervals calculated by combining multiple results at a confidence level of 95%.

(a)

(b)
Table 6 records the experimental results obtained on the FH++ data set where we tried different structures to remove the noise, containing precision, recall, and F1 values. In Table 6, both the CNN and CNN-BiLSTM models use the softmax function as the activation function to transform the denoising task into a classification task. The results show that the sequence task using the BiLSTM + CRF model has a better denoising effect.
4.4. Result of Entity Relation Extraction
For the entity relation extraction task, Table 7 records the effect of the model on the data set FFH, which contains the values of precision, recall, and F1 for a given sentence containing one to five relationships. The experimental results show that the extraction effectiveness of the model decreases as the number of relations in the input sentences increases, which is related to the imbalance of the data itself, with most of the samples in the data set having less than three relations. However, the overall recognition effect is still good.
The FHER method achieves 91.32% accuracy, 82.21% recall, and 86.52% F1 value on the FFH data set and achieves good extraction results in few-shot learning.
We did ablation experiments on the proposed denoising model to verify the effectiveness of few-shot learning for denoising. In Table 8, N represents the number of relations in the sentence, and the value in the table is the precision values for the entity relation extraction task. The experimental results show a significant decrease in recognition precision after the denoising model for a sentence containing multiple relations because the model identifies group or entity relations that we do not want. Therefore, denoising the input samples in few-shot learning is an effective method.
Similarly, we conducted ablation experiments to add prefix features to enhance and fuse character and word features. Table 9 records the experimental results. According to the experimental results, richer semantic information can effectively improve the effect of entity relation extraction during few-shot learning. The NIC model represents no features at the constructive radical level in the model, and the NCW model indicates that there are no fused characters and words in the model, and the whole model uses a vector of characters as units.
At last, we test the generalization performance of the method on the data set MDH. The experimental results show that constructing the input at the radical level and fusing characters and words can achieve good results even with smaller data set volumes. Table 10 records the results of the method FHER on the data set MDH.
4.5. Discussion
We discuss the degree of semantic information in the text after its transformation into a radical. There are as many as 3,500 commonly used Chinese characters (http://www.gov.cn/gzdt/att/att/site1/20130819/tygfhzb.pdf). These 3,500 characters can be combined in vast amounts of meaningful words, making the vector space generated after representation learning of Chinese through language models more complex. In contrast, the radicals, which are the main constituents of Chinese characters, can be grouped into 201 radicals for 20,902 Chinese characters (http://www.moe.gov.cn/downloadvideo/yuxinsi/15hanzibushou.zip). Therefore, radicals reduce the complexity of the Chinese vector space.
For the entity groups “olive oil” and “unsaturated fatty acid” and their radical representations, we used the Chinese pretrained language model BERT-wmm to calculate their similarity based on the distance (see Figure 8). We collected subject-object combinations from the data set FFH. The sum of subject and object text lengths ranged from 4 to 20, so text similarity at the character level and the radical level was calculated between each pair of subjects and objects in them. Figure 9 shows the results of the text-similarity calculation. The experiment proves that the decomposition of Chinese characters into radical representations reduces the complexity of the text to a certain extent.


5. Conclusions
This paper proposes a method for entity relation extraction in food health and safety. Three methods are proposed for entity relation extraction under small sample learning to improve the entity relation extraction effect. Firstly, we propose removing text noise according to domain specialization. The experimental results show that removing entity relations in the text unrelated to the domain can significantly improve the effect of entity relation extraction. Then, we construct text features at the radical level by disassembling individual Chinese characters from the characters’ structure. The experimental results show that the input at the radical level has low complexity and contains rich semantic features. Finally, we fuse the input text’s character vectors and word vectors from the Chinese character constructions. The experimental results show that fusing characters and words can effectively enrich the semantic information in the input vectors. The above method has a good extraction effect in few-shot learning. In the method FHER, the effect of the entity relationship extraction task is subject to a threshold, which leads to an unstable performance of the model. In the future, we will further explore the influence of noise in the text on the degree of deep learning models, the semantic relationships, and differences between the three different levels of Chinese radicals, characters, and words. Moreover, we attempt to transfer the proposed method to other application fields such as agricultural image recognition, greenhouse environmental time-series prediction, and food safety risk assessment [27–31].
Data Availability
The FFH data sets used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This study was supported by Beijing Natural Science Foundation (No. 4202014), Natural Science Foundation of China (62006008 and 61873027), Humanity and Social Science Youth Foundation of Ministry of Education of China (No. 20YJCZH229), and Open Project Program of National Engineering Laboratory for Agri-Product Quality Traceability (AQT-2020-YB6).