Abstract
The confusing use of Polygonati Rhizoma (PR) and Polygonati Odorati Rhizoma (POR) poses an unpredictable threat to the health of consumers. Sensitive, nondestructive, rapid, and multicomponent techniques for their detection are sought after. In this study, a low-cost, short-wavelength (898–1668 nm), and handheld near-infrared (NIR) spectrometer combined with multivariate spectral evaluation methods was used to establish calibration models for identifying PR and POR. NIR spectra were treated with a standard normal variate (SNV) before performing chemometric approaches. Then principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) were tested for calibration model development. The PCA results showed that spectral differences existed between the two herbs. However, the evaluation techniques could not separate them with the required accuracy. The PLS-DA calibration model, on the other hand, could separate the two herbs according to their spectral information with the prediction accuracy of >98.3%. Thus, it has been proven that a rapid, green, and low-cost method to support on-site and practical inspection through a handheld NIR instrument has been established to identify PR and POR and ensure the safety of the clinical medication.
1. Introduction
Chinese herb is the practical tool and weapon of traditional Chinese medicine (TCM), whose authenticity represents a notable feature for the demand of governments, trades, and consumers. Only with stable-quality, Chinese herbs can ensure the maximum clinical efficacy of TCM. However, the multilinking and complex performances on the supply chain of Chinese herbs, coupled with the low level of information technology, cause the source and processing of Chinese herbs to be opaque [1]. Even more, some of the links on the supply chain of Chinese herbs are out of control, which results in the mixing of herbs, commercial adulteration, and underspecified harvesting and processing. These anomalies greatly affect the safety, effectiveness, and stability of the clinical use of Chinese herbs. In fact, the substandard items in the sampling results of Chinese herbs mainly include the issue of mixed counterfeit substitution and adulteration, which have been found to be the most frequent commercial fraud [2]. As a result, it is difficult for nonprofessionals to judge the authenticity of Chinese herbs in real time by appearance traits, and they must rely on long-term laboratory experiments to identify the authenticity, which is the prominent reason for the persistence of Chinese herb commercial fraud.
The plants of the genus Polygonatum belonging to the Liliaceae family are widely distributed throughout the world, especially in China, Korea, and Mongolia [3, 4]. Two Chinese herbs, “Huangjing” and “Yuzhu,” used in TCM, are derived from this genus [5]. Importantly, they frequently appear on the dinner table as edible products and have a thousand years of application history in China [6, 7]. Polygonati rhizoma (PR), known as “Huangjing” in China, is the dried rhizome of Polygonatum kingianum Coll. et Hemsl (P. kingianum), Polygonatum sibiricum Red (P. sibiricum), and Polygonatum cyrtonema Hua (P. cyrtonema) [8–10]. PR was first recorded in “Shennong’s Herbal Classic,” describing its role as “nourishing Qi and Yin, strengthening spleen, and moistening lungs” [11]. Polygonati odorati rhizoma (POR), known as “Yuzhu” in China, is the dried rhizome of Polygonatum odoratum (Mill.) Druce (P. odoratum) with the effects of “nourishing Yin, moistening dryness, engender liquid, and allay thirst” [10, 12, 13]. PR and POR are similar in morphology and efficacy, and have been confused since ancient times [14, 15]. However, they also have different functions. For example, PR can be used to treat fatigue, weakness, indigestion, inappetence, sexual dysfunction, backache, knee pain, and premature greying of hair, while POR has the effect of clearing heat [4, 5]. It can be seen that PR and POR are not interchangeably used in the clinic, and accidental ingestion will inevitably affect the clinical efficacy and even lead to poisoning. However, it is a great challenge to distinguish PR and POR, whether in terms of plant, root, Chinese herbal piece, or powder.
At present, scholars around the world have carried out a plethora of methods to tentatively detect PR and POR, which include ultraviolet-visible spectrophotometry (UV-VIS) [16], Fourier transforms infrared spectrometry (FTIS) [17], liquid chromatograph mass spectrometry (LC-MS) [18], gas chromatography-mass spectrometry (GC-MS) [19], morphological and molecular markers [20], DNA barcoding approaches [9], plastid genome sequencing [21], and transcriptome and molecular marker technologies [22]. However, these methods require a high degree of professionalism of the inspector, which is difficult for production line employees or ordinary consumers. In addition, these methods also require the use of complex and expensive instruments to be completed in the laboratory, are very time-consuming, and lead to the destruction of the tested samples, which can no longer be used. Therefore, it is essential to find a quick, simple, and generalizable method to distinguish PR from POR.
In the last decade, the applications of handheld near-infrared (NIR) spectroscopy tools for authentication and traceability of goods, in particular food, are increased [23]. Furthermore, NIR spectroscopy, coupled with chemometrics, has also proved to be a reliable tool in the detection of origin, cultivation mode, growth years, and adulterated products of Chinese herbs [24–28]. Importantly, the technique has several advantages, such as easy operation by nonspecialists, fast testing at site, no need to crush or destroy the sample, and no need to pretreat the sample with reagents [29]. Considering that the morphologies of PR and POR are very similar, it cannot be identified quickly by appearance and characters, and most of the existing identification methods require the use of large instruments in the laboratory. Thus, this work aimed to study the fast and green method to identify two Polygonatum genus herbs, PR and POR using a handheld near-infrared instrument in various scenarios outside the laboratory.
2. Materials and Methods
2.1. Sample Collection and Preparation
In this work, 321 batches of samples, which were identified by Professor Bangxing Han (West Anhui University, Anhui, China), were selected and provided from four different medicinal material bases (Enshi, Shaoyang, Chizhou, and Jinzhai). The origin and the numbers of PR and POR were summarized in Table 1. After collecting the samples, the roots were cleaned with distilled water and dried. The samples were further dried to a constant weight in an electrothermal incubator at 45°C and stored at 4°C for use. All samples were collected in 2021.
2.2. NIR Spectra Acquisition
In this work, the samples were pulverized and placed in a quartz beaker for testing. The near-infrared spectra of all the pulverized samples were acquired on a MicroNIR™ 1700 spectrometer (Viavi Solutions, Milpitas, CA, USA) equipped with an indium gallium arsenic (InGaAs) diode array detector. A white reference measurement was obtained using a NIR reflectance standard (SpectralonTM) with a 99% diffuse reflectance. The scanning wavelength range was from 898 to 1668 nm, with a constant interval of 6.27 nm, and each spectrum was the mean of 50 scans. Considering the integrity of the collected spectral information, samples were tested once every rotation of 120 degrees for three times in total, and the final spectra were the average value of the three collected spectra.
2.3. Data Analysis
2.3.1. Selection of the Calibration and Test Sample Sets
The calibration and test sets were randomly selected from the two Polygonatum genus herbs according to the ratio of 2 : 1. The samples were marked 1, 2, and 3 for the samples originating from PR and POR, respectively. Samples marked 1 and 3 were set as the calibration set to build calibration models, and the others were set as the test set to validate the prediction accuracy of the established models.
2.3.2. Spectral Pretreatment [30]
Before classification, the mean spectra were preprocessed with different methods to remove baseline drift, slight scattering, and noise, such as Standard Normal Variable (SNV) transformation, the Savitzky–Golay filter (15 smoothing points, 2nd order polynomial, and 1st derivative), mean centering, and their different combinations were applied in this work.
2.3.3. Principal Component Analysis (PCA) [31]
The aim of PCA is to reduce the dimensionality of the data while retaining as much information as possible. “Information” is referred to as variance. The idea is to create uncorrelated artificial variables called principal components (PCs) that combine in a linear manner the original (possibly correlated) variables (e.g., absorbances, chemical components, and so on). In this work, we want to achieve the purpose of pattern recognition through the clustering phenomenon of different categories of samples in the PCs distribution map.
2.3.4. Partial Least Squares Discriminant Analysis (PLS-DA)
PLS-DA is a partial least-squares algorithm based on discriminate analysis. It is often used to deal with classification and discrimination problems. PLS-DA is similar to PCA, but the difference is that PCA is unsupervised, while PLS-DA is supervised. When the differences between the sample groups are large but the differences within the groups are small, the unsupervised analysis can well identify the differences between the groups. On the contrary, there is little difference between sample groups, so it is difficult to distinguish the differences between groups by unsupervised methods. In addition, if the difference between groups is small and the sample size of each group is large, the group with larger sample size will dominate the model. In this case, PLS-DA can solve these problems.
2.3.5. Software
PLS-DA model building was performed with PLS Toolbox 6.21, run on MATLAB (Version: R2009, MathWorks, Natick, MA, USA). The spectra pretreatment and PCA were performed with the Unscrambler X (Version: 10.4 64 bit, CAMO Software AS, Oslo, Norway).
3. Results and Discussion
3.1. NIR Spectra of All Samples
The raw NIR spectra of the calibration set are shown in Figure 1(a). As shown in Figure 1(a), the shapes of the near-infrared spectra of the two medicinal materials were similar, and there is a strong O-H absorption peak at 1464 nm and a C-H absorption peak at 1197 nm. PR and POR could not be distinguished only from the graph curve. Figure 1(b) shows the pretreatment spectrum combined with SNV. Compared with Figure 1(a), it was not difficult to find that the baseline offset was eliminated. Thus, the spectral scattering was corrected by SNV treatment.

(a)

(b)
3.2. PCA
PCA is the most commonly used linear dimensionality reduction method [32]. The goal is to map high-dimensional data and low-dimensional space through some linear projection, and expects the maximum amount of information (maximum variance) of data in the projected dimension using fewer data dimensions. At the same time, the characteristics of more original data points are retained. In this work, PCA method was used to analyze the classification of PR and POR. Here, the direction with the largest degree of dispersion (the largest variance) was taken as the first principal component (PC1), and the direction with the second principal component (the second largest variance) (PC2) was also selected. Figures 2(a) and 2(b) show the scores of calibrations set samples PC1 and PC2 (2D and 3D), and the corresponding PC1 and PC2 loadings are shown in Figure 2(c). As shown in Figures 2(a) and 2(b), the accumulative contribution rate of PC1 and PC2 reached 86%, and the blue and red numbers, respectively, represented POR and PR, in which POR was distributed in the upper region of the image. In contrast, PR was mainly distributed in the lower region, indicating that the spectra of PR and POR in PCA model are different. Still, they were not enough to distinguish them well.

(a)

(b)

(c)
3.3. PLS-DA
3.3.1. Selection of Preprocessing Methods for the Model
PLS-DA is a regression method based on characteristic variables, which is one of the partial least-squares algorithms [33]. It is mainly used to extract the differences between the categories of spectra to the maximum and can process data with many variables. In this work, we used SNV, SG, and SNV + SG technologies to preprocess the spectrum and extract the principal component of the spectrum. After the sample PR was labelled as 1 and the sample POR as 2, the data category of PR and POR were converted into [1 0] and [0 1], which were used as the reference values of the spectral data. Leave one out cross-validation (LOOCV) was used, and then PLS-DA modelling was carried out. The discriminant results of models established by different spectral pretreatment methods are shown in Table 2. The results demonstrated that SNV was the best parameter, achieving 100% predictive accuracy in both calibration and test set. The spectra preprocessed by SNV were seen in Figure 1(b). Comparing to Figure 1(a), the preprocessed spectra eliminate baseline offset and correct spectral scattering, highlighting the spectral information.
3.3.2. Selection of Latent Variables (LVs) for the Model
In order to obtain a robust model, the number of latent variables selected is crucial. In general, the fewer LVs, the lower the accuracy of the model. Although increasing the number of LVs can improve the performance of the calibration set model, when the model is used for the prediction of unknown samples, the overfitting will lead to the reduction of accuracy. In this work, the influence of LVs on the prediction error rate for the samples is shown in Figure 3. The prediction error rate for the calibration set (red points) and cross-validation set (blue points) decreased with an increasing number of LVs. When the number of LVs in the calibration set was 7, the prediction error rate of the calibration set reached the minimum value of 0, demonstrating that when the number of LVs was 7, the model was reliable and had accurate prediction capability for unknown samples.

The results showed that when SNV pretreatment was adopted and LVs was set to 7, the prediction error rates of the calibration set, cross-validation set, and test set in the established spectral model were 0, 1.183%, and 0, respectively (Table 2). It could be seen more intuitively from Figure 4(a) that POR (triangle) samples in the calibration set and test set were clustered on the side of value 0, while PR (circle) samples in the classification of calibration set and test set were clustered on the side of value 1. The number of samples close to the classification line was small, meaning that most of the two types of samples were well distinguished. Figure 4(b) showed the prediction probability of POR samples. The results showed that the prediction probability of most POR samples was 1, that of PR samples was 0, and that a few samples had a prediction probability near 0.5. According to the discriminant criteria of the PLS-DA method, it could be concluded that all POR samples in the test set were correctly identified as POR, while PR samples did not have the characteristics of POR, indicating that the discriminant accuracy of the PLS-DA discriminant model for POR samples was 100% and the error rate of the test set was 0.

(a)

(b)
Sensitivity and specificity are also very important evaluation indexes in sample classification research. Sensitivity is the true positive rate, which represents the proportion of true POR in all predicted true samples (true POR + false PR) in this study. Specificity is the true-negative rate, which refers to the proportion of true PR among all samples predicted to be PR (true PR + false POR). The sensitivity and specificity of the PLS-DA model for PR and POR samples are shown in Figure 5. Figures 5(a) and 5(c) are predicted receiver operating characteristic (ROC) graphs, and Figures 5(b) and 5(d) are predicted response graphs. ROC curve reflects the relationship between sensitivity and specificity. In the ROC curve graph, the intersection point in the upper left corner is the critical value with high sensitivity and specificity. When the ROC curve is closer to the upper left corner area, it indicates that the accuracy of sample discrimination is higher. Thus, Figures 5(b) and 5(d) show that the threshold used to classify PR and POR was drawn as a red dotted line. As the threshold increased, the specificity (blue line) was improved, meaning the number of false positives decreased. Meanwhile, the decrease of sensitivity represented an increase in the number of false negatives. The results are evident in Table 2, Figure 4, and 5. The result showed that NIRS in the short wavelength range (898–1668 nm) combined with PLS-DA could quickly distinguish PR from POR. Overall, this study uses the handheld NIR spectroscopy tool to establish a rapid and accurate identification method, which requires little professionalism from front-line inspectors and is easy to train and promote the application. In particular, it can shorten the inspection time and reduce the cost of testing in the sampling activities of product trading sites to improve the efficiency of testing in the supervision departments and protect the rights and interests of consumers, which has a broad application prospect.

(a)

(b)

(c)

(d)
4. Conclusions
This work established classification models for PR and POR using handheld NIR spectroscopy combined with chemometrics. In spectral pretreatment, SNV was used to reduce the influence of nontarget factors. In the two qualitative models of PCA and PLS-DA tested, there was significant overlap between the two samples. However, the PCA model could not distinguish accurately between the two samples. Therefore, this PCA evaluation method was unsuitable for identifying PR and POR. In contrast, the PLS-DA model could distinguish two similar medicinal materials well and achieve higher identification accuracy than 98.3%. It was indicated that the PLS-DA model achieved acceptable performance and showed that NIR spectroscopy combined with class-modeling was a potential tool for detecting PR and POR. In short, this method is simple, rapid, and environmentally friendly without complicated sample pretreatment. It is expected to be a suitable for the quality evaluation of two Polygonatum genus herbs.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by Major Science and Technology Projects of Anhui Province (202003a07020003 and 202103b06020004); Key Projects of Excellent Young Talents Support Program of Anhui Universities (gxyqZD2020040); and Natural Science Foundation of Higher Education Institutions of Anhui Province (KJ2021A0951). The authors also acknowledged the funding sponsored by the Administration of Traditional Chinese Medicine of Anhui Province (2020ccyb09) and West Anhui University (WGKQ2021022, WXZR202030 and WGKQ202001012).