Abstract
Acoustic emission (AE) signals produced by different types of rocks have different characteristics of information. Determining the brittle mineral content of rock according to the acoustic emission characteristics of rock is helpful to understand the mechanical behavior of rock in field monitoring. This article constructs a deep learning algorithm model to identify acoustic emission signals released from rock fractures with different brittle mineral contents. In response to the interference characteristics of acoustic emission signal data, a multiscale one-dimensional convolutional neural network embedded with efficient channel attention (ECA) module was incorporated into the model, and multiscale convolutional kernels were used to extract features of different levels of precision. In the latter half of the model, the BLSTM network was incorporated to extract time series-related features, local spatial uncorrelated features, and weak periodic pattern features from the acoustic emission signal data. To solve the problem that the recognition accuracy of minority samples decreases, this study replaces ReLU activation function with SELU. The results show that the multiscale 1DCNN-BLSTM model embedded in ECA module has a good antinoise performance, and the recognition accuracy can reach over 90%. The discovery of this work provides a new idea for exploring the mechanism of rock mass instability.
1. Introduction
AE signals containing much damage information will be released in the process of rock mass fracture with different brittle mineral contents, so it is of great significance to study them deeply for preventing natural disasters caused by rock mass instability [1–3]. The study shows that the main fracture time of rock mass will decrease with the increase of brittle mineral content. When the content of brittle minerals in rock mass is less, the single fracture surface is the main fracture surface, and the acoustic emission signals are concentrated around the regular fracture surface. With the increase of brittle mineral content in the rock mass, various fracture modes and multiple fracture planes appear, and fracture behavior and acoustic emission phenomena occur in many areas. It can be seen that the brittleness of rock mass has an important influence on the fracture form of rock mass [4–6]. When the content of brittle minerals reaches 50%, the internal cracks of the rock become reticular after fracturing, and the rock has reached the fracturing state [4]. Therefore, four typical AE signals released by rock fracturing with brittle content of 0%, 10%, 30%, and 50% are recognized in this study. The dataset of acoustic emission signals released by rock fracture with four brittle mineral contents collected in the laboratory is provided in the supplementary file (available here) at the end of the study, which is used to train the neural network model in the study and verify its performance. Accurate recognition of AE signals released by rock fracture under different brittle mineral contents is not only helpful to explore the instability mechanism of rock mass, but also can provide prediction and early warning for some disasters, such as rock burst, rock slab peeling, and collapse.
In the research field of acoustic emission signals of rock mass fracture, scholars usually use acoustic emission technology to extract the damage information contained in the energy released by rock mass instability, to learn the information such as changes of internal structure of rock mass and the generation of microcracks. Figure 1 shows the morphology of experimental rock samples after rock fracture with four different brittle mineral contents. Aggelis et al. [7] used the parameter feature discrimination method based on acoustic emission technology to study the proportion of tensile shear fracture signals in acoustic emission signals under different material failures. Liu et al. [8] used Brazilian disk and uniaxial compression tests to reveal the change mechanism of acoustic emission b value of rock fracture under the two tests. The second detection method is to use moment tensor; for example, Xu et al. [9] used AE signal moment tensor technology to reveal the process of detecting the fracture and damage mechanism of brittle granite. Mhamidi et al. [10] used the moment tensor inversion of acoustic emission signals to characterize bending and shear cracks in reinforced concrete beams.

(a)

(b)

(c)

(d)
More and more scholars have applied the deep learning algorithm, which can extract the deep features of data in various fields, among which the convolutional neural network (CNN) with excellent feature extraction performance is the most widely used. As a one-dimensional time series, AE signals released during rock mass fracture are characterized by local space and time dimension. I consider adding bidirectional long short-term memory (BLSTM) to this model, which is widely used to deal with one-dimensional time series problems. The powerful learning ability of BLSTM network makes up for the deficiency of both front and back related information that CNN cannot consider [11, 12]. To better identify the acoustic emission signals released by rock fractures with different brittle mineral contents, this article designs a deep learning algorithm model of a multiscale one-dimensional convolutional neural network connected to a bidirectional long short-term memory neural network and extracts acoustic emission signals from different scales end-to-end to achieve more efficient and accurate identification. In view of the noise and other related features in the extracted AE signal, the author tries to embed ECA-Net (Squeeze-and-Excitation Network) module in the multiscale convolution network to realize the weighted attention mechanism for one-dimensional data features [13, 14]. The main contributions are as follows:(1)The first half of the multiscale 1DCNN-BLSTM model constructed in this article is a multiscale convolutional neural network embedded with ECA mechanism. ECA can effectively capture cross-channel interaction information, extract time and frequency features hidden in signals of different scales through multiscale convolution, and then fuse these features into a feature information matrix. The model uses three layers of convolutional kernels with different scales to extract feature information of input signals, achieving the extraction of signal features at different time scales.(2)Since the ReLU activation function in traditional CNN assigns all negative values to 0, this may lead to a significant decline in the recognition accuracy of minority samples. Therefore, this study replaces ReLU activation function with SELU activation function. Utilize linearity at the input layer to ensure that all input few samples can enter the training model. SELU is used in the middle layer, providing a small negative slope for negative values, to better apply to minority class samples in cases of imbalanced data.(3)The acoustic emission signals released during the process of rock mass fracture are one-dimensional time series, and their signal characteristics are manifested in two aspects: local space and time. To better grasp the time-varying trend of acoustic emission signals, this study considers adding a bidirectional long short-term memory network (BLSTM) evolved from a recursive neural network in the latter half of the model. The powerful learning ability of BLSTM compensates for the shortcomings of CNN in considering bidirectional signal-related information.
2. Related Works
2.1. Multiscale 1DCNN
In signal processing, the problem is that the time resolution and frequency resolution cannot be well balanced when selecting a fixed-length window function. When the network extracts signal data features, if the scale and span of convolution kernel are too small, the time resolution of signal is better; however, it cannot learn the low-frequency features in the signal well. On the contrary, the larger-scale convolution kernel can learn the information in a longer time range, but it cannot reflect the high-frequency characteristics.
To solve the problem of difficult balance between time resolution and frequency resolution in the process of extracting signal data features, this study proposes a multiscale 1DCNN algorithm model, which is the first half of the model in this study, and the algorithm framework is shown in Figure 2. There are four-layer multiscale convolution layer with three parallel convolution kernels, in which each convolution layer is composed of convolution and maximum pooling operation. The AE signal extracts the time and frequency features hidden in different scales through multiscale convolution layer and then fuses this feature information into a feature information matrix, which is processed by full connection layer and softmax layer to obtain the corresponding recognition type. In this study, a number of one-dimensional convolution kernels are used to convolve the input acoustic emission signals at different scales to extract the characteristics of signals at different time scales. L is the signal length. Multiscale one-dimensional convolution is defined as follows:where u is the current convolution kernel scale, three convolution kernel scales are , and three convolution kernel spans are . X represents the input AE signal, τ is the activation function, and represents the ith element of the convolution output with the scale U and span V.

Because the length of one-dimensional feature obtained by multiscale convolution is inconsistent, corresponding pooling operation is used for parallel convolution layers to realize feature fusion. The pool operation process is defined as follows:where represents the input at pool scale and span , and is the ith pool result with as the pool scale and as the span.
After the first layer of their respective preliminary feature extraction, the three parallel convolution layers obtained in the next multiscale convolution process are applied with the above pool scale R and the pool operation of span C, respectively, which satisfy the following relationship with the convolution scale:where represents Hadamard product, and are constants. To further extract high-dimensional features in different scales, after the last layer of multiscale convolution, the feature signals extracted in different scales are fused, so that the information of features in different scales can blend. The specific network structure diagram is shown in Figure 2.
2.2. Principle Structure of BLSTM Neural Network
LSTM network is evolved from recurrent neural network, and its components are input layer, hidden layer, and output layer [15–17]. The data used in this study are the AE signal released by the rock mass rupture, and the effective information on the time series of the AE signal data can be fully extracted by LSTM. Input gate, forget gate, and output gate are the three gates of the LSTM network unit. The input gate and the output gate, respectively, control the way that data enter the memory cell unit and the influence of the memory unit on the current output value, the forgetting gate decides whether to discard some information, and the function of the LSTM network gate control part is mainly to store and update the information. First, the forget gate determines whether to retain the unit information of the previous moment according to the hidden state of the previous moment and the new input data . The formula is as follows:
After that, the input gate gets the valid information obtained from , and the formula is as follows:
The new state is composed of the previous unit state plus the current unit state information, and the formula is as follows:
Finally, the output is calculated from the output gate and the cell state with the following formula:
In the above formula, and represent the input and output of the cell unit at time , and and represent the hidden state and updated state of the cell unit.
However, the hidden layer of the LSTM neural network can only calculate the data in one direction, and BLSTM solves this problem well. The BLSTM network can perform forward and reverse calculations on the signal data, providing contextual information on the signal data for the created network structure, and learning more effective information than traditional LSTM [18–20].
2.3. Efficient Channel Attention Module (ECA-Net)
Channel attention mechanism has been increasingly applied to convolutional neural networks in recent years. However, some complex attention mechanisms inevitably increase the computational cost of the network. Wang et al. [21] put forward the lightweight attention module (ECA-Net) in 2019. As an ultra-lightweight attention module, ECA module (efficient channel attention) effectively balances the model performance and computational cost. Avoiding dimensionality reduction is very important for learning channel attention, and appropriate cross-channel interaction can significantly reduce the complexity of the model while maintaining performance. Therefore, the local cross-channel interaction strategy without dimensionality reduction can be effectively realized by one-dimensional convolution. ECA module can interact the relationship between channels, enhance important features, and suppress useless features. Figure 3 shows the efficient channel attention module.

As shown in Figure 3, ECA-Net uses appropriate cross-channel interaction, in which kernel size K is a key parameter, which determines the coverage of interaction. Because 1DCNN is used to capture local cross-channel interactions, convolution blocks of different channels and different CNN architectures may be different. It is related to the channel dimension of . The larger the channel dimension is, the stronger the long-term interaction will be, while the smaller the channel dimension is, the stronger the short-term interaction is. There may be some kind of mapping between and .where and are nonlinear. ECA mechanism uses an exponential function to approximate mapping :
To solve the finite linear function characteristic relationship, the mapping between and is further expressed as follows:
Given the channel dimension , the dimension is adaptively determined.where A represents the nearest odd number , γ = 2, and = 1. The main advantage of ECA mechanism is that it takes into account the computational complexity and test accuracy of the model.
3. Multiscale 1DCNN-BLSTM Network Model Design and Parameter Settings
In this article, the structure of the deep learning algorithm model used to identify acoustic emission signals from rocks with different brittle mineral contents is shown in Figure 4.

The purpose of building this model is to accurately identify the content of brittle minerals contained within rocks. The acoustic emission signals released during the fracture of rocks with different brittle mineral contents are also different. By identifying the acoustic emission signals released during the fracture of these rocks, the content of brittle minerals inside the rocks can be determined. When the content of brittle minerals reaches 50%, the internal cracks of the rock become a network after fracture, and the rock has reached a fractured state. Therefore, this article identifies four typical acoustic emission signals released by rock fractures with brittle content of 0%, 10%, 30%, and 50%, respectively. This study conducted fracturing tests on rocks with four different mineral contents in a laboratory environment. The acoustic emission signals released by them were collected through sensors and input into the model. The characteristic information of these acoustic emission signals on different scales and time dimensions was extracted, and they were accurately identified through the softmax layer to determine the content of brittle minerals inside the rocks.
The model includes three layers of multiscale convolution, and a number of one-dimensional convolution kernels are used to convolve the input AE signals in different scales to extract the characteristics of the signals in different time scales. Aiming at the problem of AE signal mixed with strong noise, this model embeds ECA module of local cross-channel interaction strategy without dimensionality reduction to realize channel attention mechanism behind the convolution layer, which effectively avoids the influence of dimension reduction on channel attention learning. Appropriate cross-channel interaction can significantly reduce the complexity of the model while maintaining its performance. In view of the correlation of acoustic emission signals of rock fracture in time dimension, it continues to carry out secondary feature extraction on the spatial features extracted by multiscale 1DCNN with BLSTM and acquires the front-back bidirectional time sequence characteristic information of AE acoustic emission signals.
To improve the accuracy of minority sample data recognition, this study replaces the activation function ReLU commonly used in deep learning with the activation function SELU. In the input layer, SELU activation function is used to ensure that a few samples of all inputs can enter the training model.
ReLU activation function commonly used in deep learning is essentially an irreversible process, because it will directly remove the part whose input is less than 0, and the original vibration signal collected by the sensor takes 0 as the average value. When dealing with imbalanced data, ReLU will eliminate massive data less than 0, which will greatly reduce the recognition accuracy of a few kinds of samples. Therefore, we consider setting the activation function as SELU in the input layer of the model. When the input signal is greater than 0, the SELU function has the same curve as ReLU, so it inherits the fast convergence characteristic of ReLU. When the input data are less than 0, a small negative slope is given, which makes the input data less than 0 have a small gradient, which solves the problem of data imbalance.
For the size of convolution kernel in MCNN (multiscale 1DCNN), to better suppress the input of high-frequency noise, the size of convolution kernel in the first layer is 16 × 1, and that in the second layer and the third layer is set to 64 and 128, respectively, which is convenient for network deepening. After each convolution layer, a 2 × 1 maximum pool operation is performed to reduce the number of network parameters, reduce the calculation amount of the model, and avoid overfitting. In addition, in order not to discard the feature information of the input sample and keep the output dimension of the convolution layer consistent with the input dimension, a padding operation is added to each convolution layer. The specific parameters of the model are shown in Table 1. The GAP in the parameter table refers to global average pooling.
4. Model Performance Testing Experimental Preparation
4.1. Experimental Design
The test rock samples were selected from the research block of the Liaohe Oilfield, and similar core samples with brittle mineral contents of 0%, 10%, 30%, and 50% (refer to the mineral composition of natural core samples) were prepared according to the prefabricated mineral composition. The rock samples are mainly composed of engineering sand, quartz, clay minerals, cement, and pouring materials. According to the experimental requirements, the proportions are made according to the relevant proportions. The proportioned magma is poured into the designed core mold and stirred evenly. Observe the magma surface until water and air bubbles no longer appear on the surface, then put the mold into an incubator and let it stand at 30°C for about 7 days. After the core sample is completely solidified, it is taken out for cutting and grinding. During the whole process, the samples are prepared according to the requirements of the International Society of Rock Mechanics (ISRM). The rock sample has good integrity, its size is 100 mm 100 mm 100 mm, and the rock sample is divided into four groups, as shown in Figure 5.

To collect the acoustic emission signals of different brittle rocks, uniaxial compression tests were carried out on different brittle rocks, and the acoustic emission signals released during the loading process of the specimens were recorded, respectively. In this study, a uniaxial fracturing experiment was carried out on a true triaxial press. During the experiment, the loading method was displacement control, and the displacement speed was set to 0.8 mm/min. The arrangement of the probes on the loaded specimen is shown in Figure 6. The failure mode of the rock sample expands from the local to the whole. As the specimen continues to be stressed, a crack appears in the rock sample, the crack gradually expands and deepens, and multiple cracks penetrate each other until the specimen is completely broken. The PCI-2 acoustic emission monitoring system produced by American Physical Acoustics Company was used to monitor the AE signal of rock fracture in real time, and sampling frequency is set to 192 kHz. Figure 7 shows the microcomputer control loading system and AE monitoring system used throughout the entire experimental process.


4.2. Acoustic Emission Signal Data Acquisition and Preprocessing
To construct a training sample dataset for the network model, this study conducted fracturing experiments on rocks with different brittle mineral contents using a true triaxial fracturing machine in a laboratory environment. Figure 8 shows the acoustic emission signals released by the fracturing test of the rock, which are segmented and intercepted. When the brittle mineral content of the rock reaches 50%, the internal cracks of the rock become a network after fracture, and the rock has reached a fractured state [4]. Therefore, the four rock samples with mineral brittleness of 0%, 10%, 30%, and 50% were divided into 3571 data samples for identification testing. The information of the four rock samples is shown in Table 2. For the traditional machine learning stage (where the dataset is on the order of ten thousand), the general data allocation ratio is 7 : 3 between the training and testing sets. The total number of collected signal datasets is 3571 4, randomly divided into 70% training set and 30% testing set, with 2500 4 training sets and 1071 4 testing sets, and 30% testing set used to evaluate the current training results of the model. The preprocessed time series signal data are, respectively, input into the network structure model constructed above for feature extraction and parameter training.

(a)

(b)

(c)

(d)
In the existing one-dimensional data classification research, the preprocessing process usually adopts filtering operation to obtain pure one-dimensional signal data. To preserve the feature information of the original AE signal as much as possible to enhance the generalization ability of the model, this study directly divides the original AE signal into fixed-length signal segments without filtering in the process of signal preprocessing. Of 560 sets of data, the original AE signal is normalized to [0, 1] by the mapminmax function, and the corresponding labels are annotated for each fixed-length AE signal after segmental interception, which are P-0, P-1, P-2, and P-3. To ensure the consistency of the data input into the model, this study unifies each segment of the AE signal intercepted in segments into a data segment with a length of 256 sampling points.
4.3. Model Performance Evaluation Indicators
To improve the generalization ability of the network model and avoid overfitting during the training process, this study adopts the shuffle batch method to disrupt the training samples, making them more random. The experimental code part of this study is carried out on the Windows 10 64-bit operating system. The deep learning framework is Torch, the programming language is Python 3.7, the training execution environment is CPU, and the Batch size is set to 128.
To evaluate the effectiveness of the network structure model in this article, accuracy (ACC), sensitivity (Sen), recall (P), F1 harmonic mean, and Matthews correlation coefficient (MCC) were used to evaluate the performance of the model. The definitions of each evaluation parameter are as follows:
TP (true positive) represents the number of positive samples correctly identified as positive samples, TN (true negative) represents the number of negative samples correctly identified as negative samples, FP (false positive) refers to the number of negative samples incorrectly identified as positive samples, and FN (false negative) represents the number of positive samples incorrectly identified as negative samples. The Matthews correlation coefficient is greater than −1 but less than 1, and the closer the coefficient value is to 1, the better the classification and recognition performance of the model.
5. Experimental Results and Analysis
5.1. Network Performance Evaluation
According to the above, the pretreated AE signal data of rock fracture with different brittle mineral contents are input into the model of the study for training and use the trained model to predict the test set. T-SNE dimensionality reduction visualization is carried out on the feature expression of data pooled by input layer, middle layer, and output layer [22]. T-SNE dimensionality reduction technology can map high latitude data to low latitude space to realize visualization, and the distance of each point after dimensionality reduction corresponds to the difference between the data. The clustering effect of test samples in each layer of the model is shown in Figure 9.

It can be seen from the visualization that the distribution of data gradually reaches linear separability after passing through each pool layer. The clustering effect of output after global average pooling layer is obvious. It is proved that the model has good feature extraction ability and classification performance for acoustic emission data sets.
To verify the effectiveness of ECA module, MCNN-BLSTM with CBAM, SE module, and CA module, respectively, is compared by the experiment. The corresponding four models are trained for 6 times to get the best results, and the training results are shown in Figure 10. It can be seen from the figure that when the attention mechanism is set to CBAM and ECA module, the training accuracy of the model is superior. However, when the attention mechanism is CBAM, the model training fluctuates greatly, while when the attention mechanism is ECA module, the model training is stable and the recognition accuracy is superior. When the attention mechanism is set to SE module and CA module, the iteration speed of the model is relatively slow and the accuracy is relatively low. In summary, when the attention mechanism is set to the ECA module, the recognition accuracy of the model is high and the model has good robustness during the training iteration process.

In this study, the ReLU function, which is commonly used in traditional convolution networks, is changed into SELU function to avoid a large amount of data being suppressed before feature extraction. The function of activation is to make nonlinear conversion for the input, and the conversion result will be used as the input of the following hidden layer. The formula is as follows:where α and μ are the parameters of the activation function.
Based on ECA-MCNN-BLSTM model, different activation functions are selected to adjust the model. These activation functions include ReLU, ELU, LeakyReLU, and SELU, and the test results are shown in Table 3. It can be seen from Table 3 that the model using SELU activation function has higher recognition accuracy and higher sensitivity.
5.2. Analysis of Results of ECA-MCNN-BLSTM Model under Different Noises
To more intuitively display noisy signals with different signal-to-noise ratios, Figure 11 shows the waveform of noisy emission signals under different signal-to-noise ratios. Taking the acoustic emission signal released by shale fracture with a brittle mineral content of 30% as an example, the spectrum of noisy AE signals and corresponding noisy AE signals under different signal-to-noise ratios are displayed.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)
To test the antinoise performance of the proposed model, the same dataset was input into models in this study, 1DCNN-BLSTM, ECA-1DCNN, and MCNN-BLSTM. The antinoise model should maintain a high recognition accuracy at least when the signal-to-noise ratio is 0 dB. Therefore, this study uses the t-distribution domain embedding algorithm (t-SNE) to analyze the clustering performance of the test set samples in different models at the fully connected layer when RSN = 0 dB. The results are shown in Figure 12. The parameter of t-SNE is set to PCA mode, the maximum number of iterations is 10000, and the learning rate is 500. T-SNE is concerned with learning to maintain the local structure of the data. When descending to a two-dimensional space, it preserves the manifold structure of the data. The distance between classes does not represent the classification distance of the real data, but is a clustering diagram. From Figure 12, it can be seen that the proposed model and 1DCNN-BLSTM have the best classification performance at a signal-to-noise ratio of 0 dB, while other models overlap during the clustering process, resulting in poor classification performance.

(a)

(b)

(c)

(d)
To further verify the noise resistance performance of the model in this study, four levels of Gaussian white noise were added to the training dataset, namely, 0 dB, −3 dB, −6 dB, and −9 dB, and input four types of AE signals into different network models to further analyze the impact of different noise conditions on the performance of the four networks. The specific experimental results are shown in Table 4. Due to the significant deviation between the accuracy and recall of the model diagnosis after adding Gaussian white noise, this section uses the harmonic mean of the two as the evaluation indicator.
Table 4 compares the accuracy and harmonic values of several models in AE signals with different signal-to-noise ratios. It can be seen from the table that the proposed model and the ECA-1DCNN model still have high recognition accuracy under strong noise. The recognition accuracy of the 1DCNN-BLSTM model is greatly reduced when = −9 dB, and the harmonic value is already less than 60%, indicating that 1DCNN-BLSTM is sensitive to data noise. Therefore, in strong noise environments, multiscale one-dimensional convolutional networks and ECA modules can effectively alleviate the model’s sensitivity to data noise. The model in this article has higher noise resistance and feature learning ability.
6. Conclusion
Aiming at the limitations of traditional rock fracture acoustic emission signal recognition, this study establishes an acoustic emission signal recognition model based on the deep learning method and proposes a deep learning-based rock fracture AE signal recognition method. This article uses the acoustic emission signals released by rock fractures as physical signals to identify different types of rock fractures and constructs a multiscale 1DCNN network connected to a BLSTM network algorithm model. In response to the problem of noise interference in the original acoustic emission signal, this study embeds an ECA mechanism on the basis of the series model to suppress the impact of noise interference and other related features on the recognition results.
The purpose of building this model is to accurately identify the content of brittle minerals contained within rocks. The acoustic emission signals released during the fracture of rocks with different brittle mineral contents are also different. By identifying the acoustic emission signals released during the fracture of these rocks, the content of brittle minerals inside the rocks can be determined. When the content of brittle minerals reaches 50%, the internal cracks of the rock become a network after fracture, and the rock has reached a fractured state. Therefore, this article identifies four typical acoustic emission signals released by rock fractures with brittle content of 0%, 10%, 30%, and 50%, respectively. This study conducted fracturing tests on rocks with four different mineral contents in a laboratory environment. The acoustic emission signals released by them were collected through sensors and input into the model. The characteristic information of these acoustic emission signals on different scales and time dimensions was extracted, and they were accurately identified through the softmax layer to determine the content of brittle minerals inside the rocks.
By comparing and analyzing with other network models, it can be seen that this model not only has high recognition accuracy but also has good noise resistance. This confirms the superiority of the network model in dealing with the multiclassification problem of AE signal data in this study. Compared with the traditional rock fracture AE signal recognition method, the method of constructing deep network model in this study has the advantages of low cost, simple process, and high recognition accuracy, which provides a new method for the current rock fracture AE signal recognition problem.
Data Availability
The acoustic emission signal data used to support the findings of this study are included within the supplementary information file, and I declare that the data are true and reliable.
Disclosure
The author has read and understood the journal’s policies and believes that neither the manuscript nor the study violates any of these.
Conflicts of Interest
The author declares that there are no conflicts of interest.
Supplementary Materials
The supplementary file provided is a dataset of acoustic emission signals released from rock fractures with four brittle mineral contents collected in the laboratory, which used to train the neural network model in the study and verify its performance. (Supplementary Materials)