Abstract

Moisture content () plays a crucial role in evaluating the quality of tea processing. However, the current automated production line for green tea heavily relies on manual methods to determine , which leads to low productivity and inadequate automation. Therefore, there is an urgent need for a fast, accurate, and convenient detection method. In this study, near-infrared spectroscopy (NIRS) data were collected from seven stages of green tea processing and preprocessed using various techniques, such as Savitzky-Golay (SG) and detrend (DT), to reduce spectral noise. Subsequently, feature variables of the preprocessed spectral data were selected using full-band principal component analysis (PCA) and competitive adaptive reweighted sampling (CARS). Afterwards, prediction models for of green tea were developed using partial least squares regression (PLSR) and back-propagation neural network (BPNN). To address the convergence speed and local optima issues of BPNN, the study proposes an adaptive probabilistic genetic algorithm (AGA) to optimize the initial weights and thresholds of BPNN, including single and double-hidden layers, respectively. The results demonstrate that the double-hidden SG-DT-PCA-AGA-BPNN model outperforms the single-hidden layer model, achieving a high correlation coefficient () of 0.994 and a low root mean square error (RMSEP) of 1.01%. This study highlights the effectiveness of increasing the number of hidden layers and using AGA to optimize the initial thresholds and weights of BPNN in improving the prediction accuracy. Furthermore, it provides a new approach to implement detection technology in green tea processing.

1. Introduction

Tea is a globally beloved beverage, particularly green tea, which is most popular in China, and its processing quality greatly affects its nutritional and market value [1, 2]. Moisture content () serves as a vital indicator of tea processing quality, with each processing step requiring precise levels [35]. This is because determines the machine control parameters, such as baking temperature, baking time, and machine speed, which in turn determine the processed green tea’s quality. Currently, the automated processing production line for green tea can only measure by heating drying equipment for about an hour, which is time-consuming, laborious, and costly, and can damage the green tea. Furthermore, it requires manual involvement, leading to nonautomatic control parameters on the production line [6]. In addition, it is worth noting that control parameters may vary for different types of tea, including one bud and one leaf, one bud and two leaves, and one bud and many leaves. This variability can lead to increased management costs and reduced production efficiency [7]. To make the automated processing production line for green tea more intelligent, there is an urgent need to establish a fast and accurate method to detect of green tea.

Near-infrared spectroscopy (NIRS) is an indirect analytical technique that possesses multiple advantages, such as rapidity, convenience, nondestructiveness, and cost-effectiveness [8, 9]. It has been extensively applied in diverse domains, such as agricultural product analysis, food and beverage industry, and petroleum product analysis, demonstrating promising outcomes and significant potential for further advancements [1012]. For instance, Ding et al. utilized NIRS and a particle swarm algorithm to optimize support vector machine (SVM), achieving a tea quality grade classification method with an impressive classification accuracy of 99.17% [13]. Chen et al. employed the combination of Back Propagation (BP) and AdaBoost with the synergy interval partial least square (Si-PLS) method to detect taste-related components in black tea [14]. Meanwhile, Shen et al. developed an SNV-PCA-ENN model using micro-NIR spectroscopy and the Elman Neural Network (ENN) for real-time moisture detection in black tea leaves. The results exhibited a favorable correlation coefficient of 0.99314 and a residual prediction bias of 11.8108, demonstrating excellent performance [15]. These experimental findings collectively validate the feasibility of utilizing NIRS for moisture content detection in tea leaves.

The prediction of water content using NIRS entails addressing the challenge of noise and interference from specific physical factors. This process can be divided into three steps: (1) implementing pretreatment methods to eliminate noise, (2) employing characteristic variable selection and spectral dimensionality reduction algorithms to eliminate redundant data and retain spectral information associated with water content, and (3) establishing a quantitative analysis model to establish the relationship between the spectrum and water content. To achieve this, various pretreatment methods including standard normal variables transformation (SNV), multiple scattering corrections (MSC), Savitzky-Golay (SG), and detrend (DT) are employed [16]. SNV and MSC are aimed at reducing the impact of inhomogeneous scattering on particle surfaces, while SG effectively eliminates high-frequency noise and DT primarily addresses baseline drift in the diffuse reflectance spectrum. These methods are commonly utilized for spectral preprocessing. In this study, principal component analysis (PCA) and competitive adaptive reweighted sampling (CARS) are implemented as dimensionality reduction and feature variable selection algorithms [17]. These methods have demonstrated excellent performance in near-infrared spectroscopy detection, as they effectively retain the essential spectral features while reducing data dimensionality. Nonlinear regression using back-propagation neural network (BPNN) has been widely applied in various fields and has achieved remarkable results. In this study, BPNN is selected as the model for water content prediction, while linear regression using partial least squares regression (PLSR) is employed as a comparative method. This selection strategy enables a comprehensive assessment of the performance differences between the prediction models.

BPNN, a multilayer feedforward neural network that utilizes back-propagation error for training, has the capability to approximate any nonlinear continuous function with just three layers, demonstrating its exceptional self-learning and error correction abilities [18, 19]. However, despite these advantages, previous research has identified limitations of BPNN. For example, the grid structure of BPNN lacks unified and comprehensive theoretical guidance, which can result in slow convergence, the possibility of local optimal solutions, and high sensitivity to initial weights and thresholds [20, 21].

To address the issue of BPNN falling into local optimal points, many studies have incorporated genetic algorithms (GA) to optimize the initial weights and thresholds of BPNN. Aishwarya and Babu proposed a hybrid BPNN-GA model and extensively tested it on various datasets, including L&T stock market data, air quality data, surface roughness, and concrete strength data. The results demonstrated that the GA-BPNN model outperformed the traditional BPNN model in terms of prediction accuracy [22]. Similarly, Cui et al. utilized GA to optimize the parameters of the BP neural network model, enhancing the convergence speed and achieving global optimization. The results showed that the GA-BPNN improved the prediction accuracy of BPNN, with an average absolute error of 0.05009, for predicting the silicon content of iron in actual production [23]. Despite existing studies that have investigated the combination of genetic algorithms (GA) with BPNN, little research has focused on GA-based optimization of BPNN specifically for moisture content () prediction in green tea processing. While the single-hidden-layer BPNN (1d-BPNN) is widely used in near-infrared spectroscopy (NIRS) for prediction and has exhibited promising results, the multi-hidden-layer BPNN has demonstrated superior capabilities in feature extraction and generalization. Based on these observations, our study is aimed at evaluating the impact of different numbers of hidden layers on prediction performance by developing both single-hidden-layer and double-hidden layer BPNN (2d-BPNN) models. The initial weights and thresholds of the BPNN models are optimized using a GA. Additionally, we incorporate PCA and CARS to construct multiple BPNN models for modeling and analysis, with the objective of identifying the optimal model for prediction. The outcomes of our study will contribute to the optimization of BPNN and its precise detection of water content in green tea processing, providing valuable insights for future research endeavors in this field.

In this paper, we propose a method to optimize the BPNN for predicting moisture content in green tea. Our approach effectively eliminates noise in the spectral data and eliminates redundant information using feature variable and wavelength algorithms, resulting in reduced training time and difficulty. To enhance the fitting ability of the BPNN, we increase the number of hidden layers, and the weights and thresholds of the BPNN are optimized using an adaptive genetic algorithm (AGA) to improve training accuracy. Incorporating the feature variables and wavelength algorithm further reduces model complexity and enhances the prediction accuracy of the optimized BPNN model. The results of our study confirm the feasibility and accuracy of the AGA-optimized BPNN for predicting moisture content in green tea.

2. Materials and Methods

2.1. Sample Preparation

The experiment was conducted in July 2021 at Hunan Xiangfeng Tea Co. On July 14th, at 10 am, fresh green tea was randomly selected from the tea base in Jinjing Town, Changsha County, Hunan Province. Approximately 10 kg of green tea was collected. The collected tea consisted of both one bud and one leaf, as well as one bud and two leaves. Each of these tea samples had a spreading leaf thickness of about 4 cm. The experiment took place in an environment with a room temperature of approximately 22°C and a relative humidity of around 65%. The green tea processing steps involved in the experiment are depicted in Figure 1.

For the tedding process, the tea leaves were evenly spread out in a cool and ventilated environment for 7 hours. The deenzyming process was conducted using a 6CST-70 drum killing machine (Changsha Xiangfeng Intelligent Equipment Co., Ltd.) with a drum speed of 24 r/min and a temperature of 340°C/320°C/300°C for 5 minutes. Cooling and airing were accomplished using a fan set at a rotation speed of 28 r/min. The rolling process was performed in a 6CR-55 rolling machine (Changsha Xiangfeng Tea Machinery Manufacturing Co., Ltd.) for 30 minutes. The first-step drying and second-step drying processes were carried out using a 6CHBZ-20 tea machine (Changsha Xiangfeng Tea Machinery Manufacturing Co., Ltd.). The temperature was set at 120°C and 95°C, respectively, with the leaf thickness being 1 ~ 2 cm for the first step and 1 cm for the second step. The duration for both steps was 25 and 30 minutes, respectively.

In this study, a miniature fiber optic spectrometer (ATP8600, AOPTECS, Xiamen, China) was utilized to collect spectra of processed green tea. The spectrometer had a spectral range of 920 to 1692 nm, a spectral resolution of 3 nm, and a wavelength accuracy of ±1 nm, providing a total of 256 bands. To minimize errors in spectral acquisition, approximately 50 g of the sample was placed on a standard whiteboard measuring  mm which had a diffuse reflectance of over 98% and had naturally cooled to room temperature. The light source (HL2000-HP-FHSA, Ocean Optics, Inc., USA), positioned 30 cm above the whiteboard, had an output power of 7 W and a lamp life of 1500 h. The acquisition device was positioned 40 cm above the whiteboard at a 45-degree angle.

Before starting the experiment, the light source was allowed to warm up for around 5 minutes. The spectrometer was turned on and allowed to warm up for 30 minutes to reach a stable state, minimizing any baseline drift interference. Each sample was collected three times, and the average value was considered as the raw light intensity spectrum of the sample. To minimize background signal impact, the whiteboard and dark current signals were collected at 30-minute intervals during the testing procedure. Finally, the raw spectrum was transformed into a reflectivity spectrum of diffuse reflectance using

is the diffuse reflectance reflectivity spectrum, represents the initial intensity of the reflected light, while and correspond to the intensities of the reflected light from the dark current signals and the whiteboard, respectively.

To establish a model for detecting moisture content in green tea processing, a total of 462 spectral data were collected. These data consisted of 84 spectra for the fresh leaf process, 84 spectra for the tedding process, 71 samples for the deenzyming process, 59 samples for the cooling and breezing process, 68 samples for the rolling process, 53 samples for the first-step drying process, and 53 samples for the second-step drying process. The flow of spectral acquisition and the model building process are illustrated in Figure 2.

2.2. Measurement of Standard Moisture Content of Samples

The moisture content of each sample was determined according to the national standard GB5009.3-2016, which entails sequential spectral collection. Weighing of the green tea samples was performed using a XYSCALE analytical electronic weighing balance (Lucky Electronic Equipment Co., Changzhou, China). The samples were placed in a baking tray and heated at 120°C for two hours in a BOWELL incubator (Bowei Instrument Equipment Co., Ltd., Dongguan, China). After cooling to room temperature, the samples were weighed, and their moisture content was calculated using where represents the moisture content of the samples. refers to the mass of the green tea sample, represents the total mass of the sample and the baking dish after heating, and denotes the total mass of the sample and the baking dish before heating. The data regarding the moisture content of green tea samples during the seven processing steps is presented in Table 1.

2.3. NIR Data Preprocessing

To mitigate potential errors arising from spectral acquisition and reduce the impact of physical properties and background information of the samples, a comprehensive set of seven preprocessing techniques was applied, encompassing MSC, SNV, SG, DT, MSC-DT, SNV-DT, and SG-DT. These methodologies were implemented to diminish the noise’s influence on both feature variable selection and model prediction.

To ascertain the most suitable preprocessing approach for augmenting model accuracy, an evaluation and comparison of PLSR and BPNN models were conducted.

2.4. Method for Selecting Characteristic Wavelengths and Characteristic Variables

The spectral bands analyzed in this study spanned from 920 to 1690 nm, with a bandwidth of 770 nm. However, many of these bands are characterized as irrelevant and redundant, which introduces complexity in the identification of essential model features and leads to decreased prediction accuracy and instability. To address this issue, researchers commonly employ feature band selection algorithms, including random frog hopping, siPLS, CARS, and SPA, to identify a subset of bands of substantial importance. Additionally, feature variable selection algorithms such as PCA, LDA, and ICA are utilized to transform the original bands into a reduced number of new feature bands [2426]. In this study, classical algorithms PCA and CARS were employed to minimize superfluous information within the spectral data.

PCA, a dimensionality reduction method, simplifies the data structure by projecting it from a high-dimensional space to a low-dimensional space through orthogonal transformation [27]. It analyzes and transforms a potentially correlated set of variables into a linearly uncorrelated set known as principal components, which capture maximum variance and minimum error. PCA subsequently reconstructs the original data to extract characteristic wavelengths from the spectra, significantly reducing the number of variables while retaining most of the relevant information.

In contrast, CARS introduces a novel spectral feature screening algorithm that treats each set of spectral bands independently and employs adaptive reweighted sampling [28]. The algorithm utilizes a partial least squares (PLS) linear model as the fitness function and employs cross-validation for optimization. By selecting the subset yielding the highest accuracy for the regression model and excluding variables with significant errors, the algorithm ensures the identification of an optimal subset. Through N Monte Carlo samplings, N subsets are generated, and N root mean squared errors of cross-validation (RMSECV) are calculated accordingly. The algorithm determines the subset of bands with the smallest RMSECV, considering the variables within this subset as the optimal set.

2.5. Partial Least Squares Regression

Partial least squares regression (PLSR) is a statistical technique extensively utilized in multiple regression analysis to facilitate simultaneous modeling. It simplifies the structure of data and investigates the correlation between two sets of variables, thus making it a suitable approach for analyzing multivariate data [29]. PLSR is based on a similar principle as principal component analysis (PCA), wherein it transforms the original independent variables () and dependent variables () into respective sets of principal components and . This transformation enables the evaluation of the relationship between and , as well as and , using the correlation principle. By integrating multiple linear regression methods, PLSR enables the examination of the association between and , thereby facilitating the analysis of the relationship between and . PLSR is commonly employed in the prediction of highly correlated datasets, such as near-infrared (NIR) data, particularly in situations involving limited sample sizes.

2.6. Back-Propagation Neural Network

The back-propagation neural network (BPNN) is a commonly employed multilayer feedforward neural network architecture that leverages error backpropagation [30, 31]. This architecture is particularly renowned for its outstanding capability to handle nonlinear fitting, rendering it highly suitable for diverse prediction and regression tasks. A typical BPNN structure encompasses an input layer, multiple hidden layers, and an output layer, as visually depicted in Figure 3.

Previous studies have highlighted that several factors can impact the prediction accuracy of BPNN, such as random initial weights and thresholds, network structure, activation function, optimizer, and learning rate. Among these factors, the number of hidden layers and the number of nodes in the hidden layers play a crucial role in determining the model’s prediction capability [32]. To address this, the study explores both single-hidden layer and double-hidden layer BPNN architectures. For the single-hidden layer architecture, a range of nodes from 8 to 40 is traversed to identify the number of nodes that yield the lowest validation set root mean squared error of cross-validation (RMSECV). The results are presented in Figure 4, where the optimal number of nodes for the single-hidden layer is determined. Regarding the grid structure of the BPNN with a double-hidden layer, a trial and error method is employed to ascertain the number of nodes for each hidden layer. Ultimately, it is determined that hidden layer 1 contains 32 nodes, while hidden layer 2 contains 8 nodes.

The hyperparameters of the BPNN used in this study are presented in Table 2. As the performance of the BPNN can be sensitive to the initial weights and thresholds, resulting in a degree of instability in the model’s predictions, an algorithmic generation approach (AGA) is introduced to enhance and optimize the performance of the BPNN.

2.7. Adaptive Genetic Algorithm (AGA)

The genetic algorithm (GA), originally proposed by John Holland, represents a robust technique for optimizing intricate systems by simulating the natural evolutionary process [33]. GA can search for the global optimal solution and avoid local optimal points, as it maintains a diverse population of solutions and employs selection, crossover, and mutation operations that introduce randomness and diversity into the search process. This allows GA to explore a wide range of potential solutions across the search space, increasing the likelihood of finding the global optimal solution. In the GA, the parameter to be optimized is represented as an individual in the population, and a fitness function is established to assess the individual’s quality. The high-fitness parents are selected to undergo selection, crossover, and mutation operations, while the low-fitness individuals are eliminated from the population. These operations lead to an improvement in the overall fitness of the population, with the individual possessing the highest fitness eventually considered as the final optimal solution.

In the standard GA, fixed probability values are assigned to the crossover and mutation operators. However, determining the optimal values for these probabilities can be challenging. A low probability value can result in a slow convergence rate as the population’s average fitness gradually increases during the early stages of evolution. On the other hand, a high probability value can lead to the loss of beneficial genes from highly fit individuals, causing the population’s average fitness to fluctuate and hindering the discovery of the optimal solution [34].

To overcome the challenges mentioned earlier, this study proposes an adaptive genetic algorithm (AGA). In AGA, individuals with fitness levels below the population average are assigned a larger fixed probability, enhancing their genes and improving fitness. Conversely, individuals with fitness levels exceeding the population average have their probability values for the crossover and mutation operators dynamically adjusted based on their fitness rank in the population and the number of iterations. The dynamic adjustment of the probabilities is aimed at increasing the likelihood of transmitting high-quality genes to the offspring, accelerating convergence, and enhancing the population’s global search capability. Additionally, moderate mutation probabilities can facilitate the emergence of superior individuals in the population, thereby improving the algorithm’s global search abilities.

The probabilities of the crossover and mutation operators in AGA can be represented using

The equations mentioned above involve key variables and constants used in the AGA. These variables include , which represents the number of individuals in the population; , indicating the crossover probability for individuals; , denoting the variance probability related to individuals; EX and , representing the mean fitness of the current population; , representing the variance of the population fitness; , used as the coefficient of variation to assess the population’s dispersion; , indicating the fitness of an individual; , representing the maximum fitness value in the current population; and , , , , , , , and , all predetermined constants.

From an algorithmic perspective, larger values are assigned to individuals with lower fitness, while smaller values are assigned to those with higher fitness. Additionally, the population dispersion coefficient, denoted as , plays a pivotal role in determining the evolutionary stage of the population. To illustrate this, consider the formula for , if two individuals with equal values belong to different populations, the population in the early stages of evolution will exhibit higher DX and lower EX. Consequently, a higher value of is allocated to the population during the initial iterations, resulting in a larger and a more rapid population evolution. Conversely, for populations in the later stages of evolution, becomes smaller, allowing the retention of more high-fitness individuals and increasing the probability of discovering the global optimal solution. The same principle applies to . The adaptive probability algorithm is designed to accelerate the early evolution stage of GA and, in the later stage, decrease the probability of crossover and mutation, thereby facilitating the emergence of individuals with higher fitness in the population. AGA improves the convergence rate of GA and enhances its global search capability.

By implementing adaptive probability adjustment, AGA demonstrates the capability to attain quicker convergence and yield improved algorithmic solutions [35, 36]. Presently, parameter selection for GA continues to depend on manual empirical adjustments, necessitating tailoring to the particulars of individual problems. In this research, following several experimental iterations, the parameters for both GA and AGA were finalized, as presented in Table 3. The procedure for employing AGA to optimize the weights and thresholds of BPNN is elucidated in Figure 5.

2.8. Model Establishment and Performance Evaluation

The statistical information for the moisture content of the sample is presented in Table 4. A reliable sampling method for constructing the calibration set is essential to enhance the model’s generalization ability. In this study, the SPXY algorithm is employed to partition the validation set and prediction set in a 4 : 1 ratio. The validation set comprises 370 spectra, while the prediction set consists of 92 spectra. The moisture content of all green tea samples ranged from 5.17% to 78.74%.

The development of green tea moisture content prediction models utilized PLSR and BPNN algorithms for the full-band spectra, as well as spectra after PCA dimensionality reduction and CARS feature selection. To evaluate the models’ performance, the cross-validation correlation coefficient () and root mean square error of cross-validation (RMSECV) were employed for the validation set, while the cross-validation correlation coefficient () and root mean square error of cross-validation (RMSEP) were used for the prediction set. A high and , along with a low RMSECV and RMSEP, indicate superior predictive ability and accuracy of the model. Moreover, when the values of RMSECV and RMSEP are close, the model exhibits a stable proficiency in predicting the target variable [37]. The assessment parameters are defined as follows: where and represent the number of samples in the calibration and prediction set, respectively; and are the predicted and reference values of the th sample in calibration set, respectively; and are the predicted and reference values of the th sample in prediction set, respectively; and are the mean reference value of samples in the calibration and prediction set, respectively.

All data processing and model building were performed on MATLAB R2021a (MathWorks, Natick, MA, USA) and PyCharm 2020.1.2 (JetBrains, USA) under Windows 10.

3. Results and Discussion

3.1. Spectral Features and Preprocessing

The spectrum consists of various absorption peaks associated with hydrogen functional groups (-SH, -CH, -HO, -NH, etc.) and their combination frequencies. The position and intensity of these absorption peaks have a significant impact on the NIRS reflectance rate, which varies depending on different functional groups. Figure 6 illustrates the raw spectra, preprocessed spectra, and average spectra for each stage of all samples. The average spectra demonstrate that the diffuse reflectance spectrum gradually increases as the water content of green tea decreases. Notably, the troughs observed around 1200 nm and 1450 nm can be attributed to the OH bond, which is the primary characteristic group of water [37]. The trough at 1450 nm is caused by the stretching vibration of the first-order frequency doubling of the OH group in water, while the trough at 1200 nm is due to the second-order frequency doubling of the CH group in protein and the combined frequency absorption of the OH molecule in water.

During the second-step drying process, the spectra exhibit a significant decrease from 1000 nm to 1200 nm, possibly resulting from the reduction in water content in green tea leaves, an increase in the content of tea polyphenols and caffeine, and an increase in the CH and NH bonds, leading to decreased reflectance. Since spectra are subject to noise such as high-frequency noise, baseline drift, and hyperspectral overlap, preprocessing methods are employed to mitigate the impact of noise and enhance the predictive performance and stability of the model. In this study, PLSR and 1d-BPNN are applied to each preprocessed spectrum, and the optimal preprocessing method is selected to minimize the influence of noise on the model’s performance in NIR.

Table 5 summarizes the results of the PLSR and 1d-BPNN models using different spectral pretreatment methods. The correlation coefficient and root mean square error of prediction (RMSEP) for the prediction set were calculated for each model. For the PLSR model, using the original spectral data yielded an value of 0.959 and an RMSEP of 2.53. The model’s performance improved after applying the DT and SG-DT pretreatment methods compared to the original spectra. Similarly, for the 1d-BPNN model utilizing the original spectral data, an value of 0.960 and an RMSEP of 2.47 were obtained. The model’s performance improved after applying the DT, SNV-DT, MSC-DT, and SG-DT pretreatment methods compared to the original spectra. Among the different pretreatment methods, both the PLSR and 1d-BPNN models showed that the DT and SG-DT methods outperformed the original spectra. Moreover, the SG-DT method demonstrated superiority in both models. This indicates that the SG-DT method effectively reduces baseline drift caused by high-frequency noise and background noise, leading to improved model accuracy. Therefore, in this study, the SG-DT method is employed as the spectral preprocessing method.

3.2. Feature Variable Selection

The current study employed PCA and CARS methods to select feature bands from the full range of NIR spectra, which initially consisted of 256 wavelengths. This process is aimed at establishing a reliable model for predicting green tea moisture content. Specifically, PCA was utilized to extract hidden feature information and reduce the spectral data from 256 dimensions to 15 dimensions. These 15 principal components (PCs) accounted for 99.87% of the total variance observed in the NIR spectra.

In Figure 7, the PCA results of the seven steps involved in green tea processing are presented and analyzed. Figure 7(a) displays a plot of the first two principal components, namely, PC1 and PC2, which explain 74.68% and 21.00% of the total variance of the NIR spectra, respectively. Notably, differences were observed between samples from the second drying step and other steps, while some correlations were detected among samples from different steps. Particularly, substantial overlap was observed between samples from the fresh leaf and tedding steps. To address this overlap, additional principal components were extracted to construct the model.

To determine the final number of principal components for PCA, Figure 7(b) illustrates the explained variance and cumulative explained variance of the first 15 principal components. The explained variance contribution of the 15 principal components was approximately 0.00447%, while the cumulative explained variance reached 99.88631%. These results indicate that the first 15 principal components capture a significant portion of the effective information contained within the spectra, thus enabling their use as the final set of principal components for PCA.

The CARS (competitive adaptive reweighted sampling) method was used for feature wavelength selection, as shown in Figure 8. A 10-fold cross-validation process was employed, with the minimum root mean square error of cross-validation (RMSECV) as the target, utilizing 20 potential optimal variables and conducting 100 iterations.

Analysis of Figures 8(a)8(c) reveals that the number of selected wavelength bands decreases as the number of iterations increases. Simultaneously, the RMSECV initially decreases and then starts to increase. From iterations 1 to 31, the RMSECV value exhibits a continuous decrease, indicating that the eliminated variables during the selection process have little or no significant relationship with the moisture content of green tea. After reaching iteration 32, the RMSECV value reaches its minimum with 42 feature variables, indicating the selection of an optimal subset of spectral variables. Subsequently, from iterations 33 to 100, the RMSECV value continues to increase.

The final feature wavelength bands selected after 32 iterations of CARS are 932, 940, 952, 975, 1019, 1037, 1071, 1077, 1093, 1096, 1109, 1127, 1145, 1166, 1188, 1191, 1202, 1208, 1216, 1269, 1271, 1284, 1295, 1300, 1310, 1313, 1315, 1379, 1392, 1437, 1451, 1462, 1499, 1502, 1513, 1522, 1525, 1550, 1592, 1637, 1640, and 1693 nm.

4. Results and Discussion of Different Prediction Models

Table 6 presents the prediction results of full-band NIR spectra (920 ∼ 1690 nm) using PLSR, 1d-BPNN, and 2d-BPNN. The PLSR model performed less effectively compared to the 1d-BPNN and 2d-BPNN models, indicating the presence of a significant nonlinear relationship between the full-band spectra and water content. BPNN, known for its strong nonlinear fitting ability, can better capture the spectral data patterns. The improved performance of 2d-BPNN over 1d-BPNN can be attributed to the increased fitting ability achieved through the addition of hidden layers. The 2d-BPNN model achieved an value of 0.984 and an value of 0.977, with an RMSEP of 1.83%. The slightly lower value may be due to the presence of redundant data in the spectral data, which makes model training more challenging and can impact its ability to identify patterns between water content and spectra.

In this study, PCA and CARS were used for dimensionality reduction and feature band selection to remove irrelevant and redundant spectral data, and PLSR, 1d-BPNN, and 2d-BPNN models were developed for predicting the green tea’s moisture content. The prediction results of these models using PCA and CARS are presented in Table 7. PCA successfully reduced the number of feature variables to 5.86% of the original full-band spectrum, leading to improved prediction accuracy for the PLSR, 1d-BPNN, and 2d-BPNN models. It is important to note that the PCA-2d-BPNN model achieved a high value of 0.986. On the other hand, CARS reduced the number of feature variables to 16% of the original full-band spectrum. While the prediction accuracy of the CARS-PLSR model slightly decreased, the performance of the CARS-1d-BPNN and CARS-2d-BPNN models improved. The CARS-2d-BPNN model achieved an value of 0.982 and an RMSEP of 1.70%. Based on these results, it can be concluded that utilizing PCA and CARS for removing irrelevant and redundant spectral data improves the prediction accuracy of the models.

Figure 9 compares the mean squared error (MSE) plots of 1d-BPNN and 2d-BPNN models optimized using genetic algorithm (GA) and adaptive genetic algorithm (AGA). The MSE values for the GA-optimized 1d-BPNN and 2d-BPNN models decrease from 0.331 to 0.301 and from 0.237 to 0.191, respectively, within 50 generations. Similarly, the MSE values for the AGA-optimized 1d-BPNN and 2d-BPNN models decrease from 0.343 to 0.235 and from 0.251 to 0.163, respectively. The results indicate that AGA exhibits better global search ability than GA, leading to improved optimization efficiency of the initial network weights.

Figure 10 illustrates the comparison between the predicted and true values of the three 1d-BPNN models: standard 1d-BPNN, GA-1d-BPNN, and AGA-1d-BPNN. The error distribution for each model ranges from negative percentages to positive percentages. AGA-optimized 1d-BPNN shows greater stability compared to the other two models (1d-BPNN and GA-optimized 1d-BPNN). Figure 11 shows the comparison between the predicted and true values of the three 2d-BPNN models: standard 2d-BPNN, GA-2d-BPNN, and AGA-2d-BPNN. The error distribution for each model also ranges from negative percentages to positive percentages. AGA-2d-BPNN exhibits smaller average error and greater stability compared to 2d-BPNN and GA-2d-BPNN. These results demonstrate that optimizing the initial weights and thresholds of the BPNN models helps prevent them from getting trapped in local optimal solutions. The use of adaptive probabilistic genetic strategy in AGA further improves the prediction accuracy and generalization performance of the BPNN models.

Based on the above conclusions, it can be inferred that the dimensionality reduction and feature variable algorithms used in this study effectively remove redundant information from the spectrum and reduce the data dimensionality. The AGA optimization technique enhances the global search capability of the BPNN model by optimizing its initial weights and thresholds, thereby avoiding local extreme points and improving prediction accuracy. By combining these methods, the study successfully leveraged the strengths of both techniques to enhance the prediction performance of the BPNN model.

Table 7 presents an overview of the model’s performance. Both the PCA-AGA-1d-BPNN and PCA-AGA-2d-BPNN models achieve high values and values exceeding 0.99. Notably, the PCA-AGA-2d-BPNN model demonstrates the best performance. It achieves an value of 0.995 and an RMSECV value of 0.81% for the calibration set. For the prediction set, it achieves an value of 0.994 and an RMSEP value of 1.01%.

The prediction results and errors of the PCA-AGA-2d-BPNN model are illustrated in Figure 12, showing an average error of 0.72%. The predicted values closely align with the true values, with errors concentrated between -2.61% and 2.50%.

5. Conclusions

Based on the findings of this study, the following conclusions can be drawn: (1)Comparison of models: Among the different models compared, the double-hidden layer BPNN under full-band spectra showed improved prediction accuracy compared to the single-hidden layer BPNN, with an value of 0.977 and an RMSEP of 1.83%(2)Impact of feature variable selection: Using PCA for feature variable selection resulted in better prediction results compared to using full-band spectra in PLSR, 1d-BPNN, and 2d-BPNN models. The PCA-2d-BPNN model demonstrated the best performance, and PCA outperformed CARS in terms of improving the prediction accuracy of the model(3)Comparison of optimization techniques: Compared to GA, AGA performed better in optimizing the 1d-BPNN and 2d-BPNN models under the full-band spectra. The AGA-2d-BPNN model achieved better prediction, with an value of 0.986 and an RMSEP of 1.51%(4)Combined model performance: The PCA combined with AGA-optimized 2d-BPNN model achieved optimal results with reduced training parameters. The SG-DT-PCA-AGA-BPNN model demonstrated optimal prediction performance, with an value of 0.995 and an RMSECV of 0.81% for the calibration set, and an value of 0.994 and an RMSEP of 1.01% for the prediction set

The AGA-optimized BPNN model has potential applications in predicting the moisture content of green tea during processing, offering benefits such as improved adjustment and monitoring capabilities for automated production lines and reduced production costs. However, it is important to consider the stability of the model in the complex green tea production environment. The limited range of green tea samples used in this study may lead to model instability. Additionally, the single NIR data used in the model is susceptible to environmental factors such as temperature. Future research is aimed at addressing these limitations by conducting comprehensive data collection, including green tea processing images, temperature, and NIR data. This integrated approach will enable a more comprehensive analysis and improve the model’s performance and stability.

Data Availability

Data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Authors’ Contributions

Conceptualization was conducted by Z.Z. and D.L.; methodology was conducted by Z.Z.; software was conducted by D.L; validation was conducted by D.L.; formal analysis was conducted by D.L.; investigation was conducted by Z.Z.; data curation was conducted by D.L.; Z.Z. wrote the original draft preparation; Z.Z. and D.L. wrote, reviewed, and edited the paper; funding acquisition was conducted by Z.Z. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This work is supported by the National Key Research and Development Program (No. 2022YFD2101101), the Project of Scientific and Technological Innovation Planning of Hunan Province (No. 2021NK1020), the Earmarked Fund for China Agriculture Research System (CARS-19), the Hunan Province Modern Agriculture Technology System for Tea Industry and the High Performance Computing Center of Central South University, and the High-Performance Computing Center of 426 Central South University.