Abstract

The growing popularity of soy proteins among vegans and vegetarians, owing to their high protein content and widespread availability, has led to scientific studies on its various extraction methods mainly on ultrafiltration. This research employed artificial neural network (ANN) and Box-Behnken design (BBD) methodologies to predict the process parameters of ultrafiltration for the preparation of soy protein. Using BBD, the optimum process parameters of ultrafiltration were identified via the desirability function approach. The optimized permeate flux was 11.13 litres per hour (LPH) and 85.52% protein content in retentate. The identified ideal process parameters for ultrafiltration to achieve maximal protein retention encompassed a 10 kDa membrane module, a transmembrane pressure of 117 kPa (17 PSI), a volume concentration ratio of 3.5, diafiltration set at 1, and a flow rate of 65% of the pump capacity, exhibiting an absolute percent error value of 2.81. Employing these refined process parameters, the predicted value for protein retentate stood at 80.49%. The predictive accuracy of the model achieved an impressive 99.61% for protein retention. The ANN model effectively predicted the optimal ultrafiltration conditions, resulting in maximal protein retention and a protein content accuracy of 96.41% and 99.61%, respectively.

1. Practical Applications

Ultrafiltration process optimization for the production of various high-quality protein concentrates and isolates using artificial neural network can be done with high accuracy.

2. Introduction

Pressure-driven membrane separation processes, particularly ultrafiltration (UF), have gained widespread adoption for protein concentration and purification [1]. This technique emerges as a promising alternative to traditional acid precipitation methods due to its minimal pH shock, ambient temperature conditions, and superior membrane selectivity during purification and fractionation [2, 3]. Efficient protein recovery hinges upon suitable membrane modules and operating parameters, influencing the effectiveness of UF in producing various protein forms such as concentrates and isolates. Aguero et al. [4] reported higher protein yield, better protein quality, and other functional attributes after using a membrane separation process. While UF exhibits significant potential, membrane fouling remains a primary concern impacting its performance, necessitating prefiltration strategies. As membrane separation methods represent physical separation techniques, there is an anticipated surge in their applications in the near future. A number of researchers [58] have identified the scope of membrane-based techniques, especially ultrafiltration and microfiltration for concentrating and purifying soy proteins. The relatively lower quality of commercially available soy proteins derived from conventional methods underscores the opportunity for utilizing appropriate ultrafiltration membrane modules to enhance their quality.

Numerous studies have explored the UF process for the preparation of various soy protein fractions under varied conditions [911]. Prior research has demonstrated the efficacy of BBD in optimizing soy protein extraction [12, 13]. Similarly, ANN has emerged as a powerful tool for understanding complex systems due to its accuracy and computational efficiency [14, 15]. Its applications in membrane separation processes, including microfiltration and ultrafiltration, have showcased predictive capabilities for flux, rejection, and separation efficiency in various domains [1619]. In dairy processing, ANNs have been used as an efficient method for modelling and simulating the ultrafiltration of milk in cross-flow mode [2022]. Park et al. [23] used ANN for predicting the fouling behaviour of ultrafiltration in pilot scale operations. ANN was used to model the flux decline during the ultrafiltration of whey by Gaudio et al. [24]. Commercially available soy proteins are prepared by conventional acid precipitation methods. The overall quality and functional properties of soy proteins are adversely affected by pH shock and other processing conditions of acid precipitation methods [25], and these lead to the scope of using suitable ultrafiltration membrane modules for producing high-quality soy proteins. Less number of investigations has been carried out to maintain the low molecular weight bioactive components also with soy proteins using low molecular weight cutoff hollow fibre ultrafiltration membrane modules.

Recent studies have reported the applications of ANN and RSM techniques as different process optimization tools in food processing [2628]. Despite these advancements, there is limited literature on comparing the optimization of soy protein production via ultrafiltration using ANN and BBD. This study seeks to bridge this gap by employing both methods, thereby offering a comprehensive comparison of their optimization results for soy protein ultrafiltration.

3. Materials and Methods

3.1. Extraction of Protein from Defatted Soy Flour

For isolation of protein, soybean (variety JS335) was procured from the research farm of ICAR-Central Institute of Agricultural Engineering, Bhopal, Madhya Pradesh (7702410E longitude and 2301835N latitude), and defatting was done using Sox plus (Soxtron, Tulin-6 number) apparatus [29]. The subsequent extraction of protein from the defatted soy flour took place in purified water at pH 9 (adjusted with 0.2 M NaOH) and at 50°C with a solid/liquid ratio 1/10, using a mechanical stirrer (Jyoti, model JSI-555, India) for one and half hours. Solid-liquid separation was performed in a centrifuge (Remi instruments-model K-70, India) at 10,000 g for 20 minutes at 15°C temperature. The resultant supernatant was then utilized as the feed for the subsequent ultrafiltration process following the methodology outlined by John and Sinha [10].

3.2. Production of Soy Protein Isolate Using Ultrafiltration

Following the centrifugation process, the supernatant, constituting the protein extract, underwent prefiltration to eliminate particles prone to causing fouling in the subsequent ultrafiltration phase. A Millipore microfiltration unit employing a cellulose nitrate membrane, 47 mm in diameter with a 5.0 μ pore size, was utilized for this purpose. The ultrafiltration study was conducted utilizing a laboratory-scale GE Healthcare hollow fibre module, boasting a surface area of 650 cm2. The schematic depiction of the soy protein preparation process from defatted soy flour (DSF) via ultrafiltration is delineated in Figure 1. Each trial utilized 75 g of defatted soy flour (DSF), resulting in the acquisition of 550 ml of extract. Postcentrifugation, the supernatant was subjected to prefiltration prior to its entry into the ultrafiltration unit. Approximately 375 ml of retentate, the component of interest, was collected from the process, while the permeate was discarded, indicating successful concentration of the target compounds.

3.2.1. Permeate Flux

Flux is defined as the permeate flow per unit membrane area in unit time. It is expressed in litres per square meter of membrane surface area per hour.

Surface area of hollow fibre cartridge is 650 cm2.

3.2.2. Protein Rejection

It is the percentage of protein removed from the feed stream by the membrane. If one is interested in the retentate part, it is the percentage of protein retained in the retentate part.

3.3. Estimation of Protein

Protein estimation was done as per the method described by Ranganna [30]. About 0.5 g of the sample was weighed into the digestion tubes, and two heaped spatulas of digestion were added to each tube. 10 ml of concentrated H2SO4 was also added, and samples digested until the contents of the tubes were sea green in colour. The digested sample was transferred into the distillation chamber, and 20 ml of 4% boric acid was kept in the collecting conical flask. The boric acid turned from reddish pink to green as it collected the ammonia in the distillation chamber. Then, the green-coloured boric acid was titrated against 0.1 N HCl until its colour turned to pink. Then, the protein content of soy proteins was calculated using the given formula: where is the sample titre value, is the blank titre value, and is the sample weight.

3.4. Optimization of Process Parameters of Ultrafiltration for the Preparation of Soy Protein

Membrane modules, transmembrane pressure, diafiltration and volume concentration ratio, and flow rate were selected as the independent variables, and permeate flux, protein content of retentate, and protein retention percentage were the dependent parameters (response variables). Designing of experiments, fitting mathematical models, and optimization of variables were done with Design-Expert software 7.0.0 (trial version) using response surface methodology (BBD). Optimized process parameters were used for producing soy proteins by ultrafiltration in present study. Table 1 outlines the independent parameters alongside the corresponding levels designated for the experiments and subsequent analysis.

Responses can be represented as a function of independent variables.

Experimental data were analyzed to optimize the process parameters with respect to the responses. Regression analysis and analysis of variance were conducted to fit the model and to know the statistical significance of the selected model terms. Model adequacy was determined using model analysis, value, and lack of fit test. Model is adequate if the lack of fit is nonsignificant. value represents the ratio of the explained variation to total variation [31], and if is more than 80 percent, the model can be considered for further analysis [32]. Furthermore, the model’s performance was assessed using various statistics including root mean square error (RMSE), sum of squares error (SSE), percent error, and chi-square.

Numerical optimization technique was used to optimize the responses simultaneously. The desired goal for each dependent and independent parameter was chosen. All independent parameters were kept in range, and permeate flux and protein retention percentage was maximized. Optimum solution was obtained based on the combined desirability value. Dependent variables or responses were assigned equal importance. Response surfaces were generated in Design-Expert software to understand the effect of independent parameters on dependent parameters or responses.

For conducting the experiments, 75 grams of defatted soy flour (average particle size 330 μ) was taken and 550 ml extract was obtained after centrifugation. Total solid content of extract was recorded as %.

3.5. ANN Modelling

The BBD and the responses were used to develop the ANN model using Python 3.9.4. The basic ANN structure is given in Figure 2.

ANN was run to fit the regression model. Dependent variables were designated as the target variable, while independent variables served as predictors. Column abbreviations and details were provided under data description to understand the dataset better. The dataset was divided into training and testing sets with an 80 : 20 ratio to facilitate model training and testing. Essential libraries like “TensorFlow” and “Keras” were installed to implement deep learning ANNs in Python. Tuning of the ANN model involved searching for the best combination for optimal model performance. The “Sequential” module from the Keras library was used to create a sequence of ANN layers. The “Dense” module in Keras helped define each layer, specifying the number of neurons, initialization technique, weights in the network, and activation functions for each neuron. Different model architectures were experimented with by varying the number of layers and neurons to identify the best structure. Among the combinations tested, the model demonstrated superior performance with two hidden layers, each containing five neurons (Figure 2). Hence, two hidden layers with five neurons each and one-input and one-output layers were used for training. The “Dense” module of Keras was used to define each layer where the specification of the number of neurons, the technique to be used to initialize the weights in the network and the activation function for each neuron in that layer, etc. were defined. Specifications like batch size (5, 10, 15, and 20) and epochs (up to 50) were employed during model training. Hyperparameters were fine-tuned using a grid search approach to determine the most effective configuration. The model was trained using the best set of parameters identified earlier, and predictions were made on the testing data. Absolute percentage error was calculated for each row in the testing data. Mean absolute percentage error (MAPE) was computed as the average of all rows, and the ANN model accuracy was derived as 100-MAPE. The model’s performance was assessed using various statistics including root mean square error (RMSE), sum of squares error (SSE), percent error, and chi-square.

4. Results and Discussion

4.1. Preparation of Soy Proteins Using Ultrafiltration

Table 2 showcases the proximate analysis results for both the defatted soy flour (DSF) and the soy protein obtained through ultrafiltration. DSF displayed a protein content of 56%, whereas the soy protein derived via ultrafiltration exhibited substantially higher protein content, recording at 88%. Moisture and fat content displayed comparable values between the DSF and the ultrafiltered soy protein. However, a notable disparity was observed in the ash content, with the ultrafiltered soy protein demonstrating a lower ash content in comparison to DSF. This divergence in ash content could potentially be attributed to the characteristics of the ultrafiltration process. Smaller molecular weight of ash particles relative to the selective retention by the ultrafiltration membrane led to a reduced concentration of ash in the resulting soy protein. This outcome aligns with findings from John and Sinha’s research [10] and highlights a distinct alteration in the proximate composition, notably the increased protein content and a reduction in ash content, through the ultrafiltration process from DSF to soy protein.

4.2. Optimization of Process Parameters of Ultrafiltration Process Using Response Surface Methodology

Ultrafiltration process parameters were optimized using response surface methodology (Box-Behnken design). With 46 experiments, including 6 central point runs, the study sought to comprehend how independent parameters influenced dependent variables within the ultrafiltration process. Quadratic model was used to fit the data. The statistical significance of each model term was checked by regression analysis and analysis of variance (ANOVA). Table 3 depicts the regression coefficients and significance of each variable on membrane process parameters. All the quadratic models were observed to be significant at , and lack of fit was nonsignificant.

4.2.1. Prediction of Permeate Flux by Box-Behnken Method

The model has shown significant overall significance at an extremely low value (), indicating its reliability. Among the model terms, certain factors like ultrafiltration membrane modules, transmembrane pressure, interaction effect of ultrafiltration membrane modules, and flow rate, as well as square terms of ultrafiltration membrane modules, transmembrane pressure, and flow rate, have emerged as statistically significant. The “lack of fit value” of 2.17 implies that the lack of fit is not significantly relative to the pure error. This suggests that the model accurately predicts the responses. The “Pred R-squared” of 0.8100, which denotes the predictive ability of the model, aligns reasonably well with the “Adj R-squared” of 0.9093. This indicates that a substantial portion of the variability in the data can be explained by the model. The coefficient of variation (C.V) calculated at 15.53% suggests a slightly lower precision due to a higher C.V value compared to the ideal threshold of 10%. A decrease in the root mean square error (RMSE), sum of squared errors (SSE), percent error, and values (Table 4) indicates the enhanced appropriateness of employing this model for the prediction of permeate flux.

The ultrafiltration membrane module exhibits a significant effect on permeate flux (). Initially, there was an increase in permeate flux with the rise in membrane module, followed by a subsequent decline. Optimal permeate flux is achieved within the range of 15.50 to 22.75 kDa (Figure 3). The highest permeate flux was observed with the 10 kDa membrane, surpassing both 1 kDa and 30 kDa membranes. This finding parallels Sagu et al.’s [33] study on banana juice ultrafiltration using hollow fibre membranes. They observed minimal pore blocking in the 10 kDa membrane, resulting in higher flux compared to higher cutoff membranes (27 and 44 kDa membranes). Transmembrane pressure also significantly impacts permeate flux (), showcasing an increase in flux with rising pressure. In contrast, volume concentration ratio and flow rate did not show a significant impact on permeate flux. However, diafiltration exhibited a significant effect () on permeate flux. The interaction effect between the ultrafiltration membrane module and feed flow rate significantly affects permeate flux at a 5% level (). These findings resonate with prior studies. Blonigen’s [34] research on protein mixture ultrafiltration with a 30 kDa membrane revealed a similar curvature in permeate flux concerning transmembrane pressure. The initial flux increase was attributed to enhanced cross-flow rate with pressure. Understanding these influences on permeate flux is pivotal for optimizing the ultrafiltration process, especially in selecting membrane modules and adjusting operating parameters to achieve desirable flux rates while minimizing pore blocking and flux decline over time.

4.2.2. Prediction of Protein in Retentate by Box-Behnken Method

In the analysis of the experimental data, the model value of 18.01, indicating significance at , establishes the overall significance of the model. Notably, values of “Prob > ” less than 0.0500 underscore the significance of model terms. Specifically, among these, the main effect of ultrafiltration membrane modules, including their square terms, as well as diafiltration and flow rate, emerged as noteworthy contributors within the model. Furthermore, the assessment of the “lack of fit value” at 3.10 indicates its lack of significance concerning pure error, signifying the model’s adequacy in fitting the observed data. The comparability between the “Pred R-squared” (0.7732) and the “Adj R-squared” (0.8832) suggests a reasonable agreement in predicting outcomes and a robust fit of the model. Additionally, the other performance parameters (Table 4) contribute to elucidating the model’s aptness in predicting protein levels in the retentate. Interestingly, concerning membrane module size (kDa), a trend was observed wherein an initial increase in the retentate’s protein content was followed by a subsequent decrease. However, this change was not solely linear, implying a nuanced relationship between module size and protein content. Moreover, the examination of interaction effects among independent parameters revealed their nonsignificant impact on the retentate’s protein content, suggesting that their combined influence did not significantly alter the protein content outcomes.

4.2.3. Prediction of Protein Retention by Box-Behnken Method

The model value of 2.93 demonstrates significance at , affirming the overall significance of the model. Ultrafiltration membrane modules, their square terms, volume concentration ratio, and diafiltration emerged as pivotal model terms at a significance level of , underscoring their substantive impact within the model. The “lack of fit value” at 0.20 indicates its lack of significance concerning pure error, validating the model’s suitability in accurately representing the observed data. Furthermore, the coefficient of variation (C.V) below ten percent reflects the experiments’ precision, ensuring reliable and consistent results. The values of RMSE, SSE, absolute average deviation, etc. (Table 4), further underscore the appropriateness of this model in predicting protein retention through ultrafiltration membrane. There observed an inverse relationship between protein retention and ultrafiltration membrane module size (Figure 4). Remarkably, the highest protein retention was attained using a membrane module with a lower molecular weight cutoff. Similar observations align with prior research. For instance, Vijayasanthi et al. [35] examined protein recovery from coconut milk whey using ultrafiltration, noting 83% () protein retention with a 300 kDa membrane and 86-90% () retention with membranes at 5 and 50 kDa molecular weight cutoffs. Similarly, Machado et al. [36] explored ultrafiltration (10 and 30 kDa) for protease separation and purification, highlighting superior purification with smaller MWCO membranes. The interaction effects among independent variables did not significantly influence protein retention, suggesting that their combined impact did not markedly alter the observed protein retention outcomes.

4.2.4. Optimization of Process Parameters

The polynomial model derived through response surface methodology underwent validation via three trials conducted at the numerically optimized point, maximizing desirability. The optimal ultrafiltration process parameters, determined using the desirability function approach with a desirability score of 0.767, were as follows: membrane module size: 13.38 kDa; transmembrane pressure: 117 kPa (17 PSI); volume concentration ratio: 3.6; diafiltration: 1; and flow rate: 63.04% of pump capacity. At these optimized conditions, the resulting performance indicators were as follows: permeate flux: 11.13 LPH; protein content in retentate: 85.52%; and protein retention: 98.99%. The identified parameters and their respective values showcase an enhanced performance, maximizing permeate flux while maintaining high protein content in the retentate and achieving substantial protein retention levels.

4.3. ANN Modelling
4.3.1. Prediction of Permeate Flux by ANN

In the experiment, the ANN model was developed to predict the target variable “permeate flux” using the “predictors” UMM (ultrafiltration membrane module), TMP (transmembrane pressure), VCR (volume concentration ratio), diafiltration, and FR (flow rate). Initially, data from the BBD parameters, including the response variable permeate flux, were imported into the model development environment. To evaluate the model, the dataset underwent a split into training and testing sets. Subsequently, multiple configurations with different numbers of hidden layers and neurons were explored and tested to identify the optimal ANN model. The determination of the best combination of epochs and batch size involved a grid search approach, and a graphical representation was created to pinpoint the most suitable parameters (Figure 5).

However, despite these efforts, the performance of the developed ANN regression model in predicting permeates flux remained suboptimal. The accuracy of the model was noted to be only 39 percent, reflecting a significant discrepancy between predicted and actual values. Additionally, other performance metrics utilized to assess the efficiency of the ANN model (Table 5) did not align favourably, further suggesting inadequacy in using this particular ANN model for predicting permeate flux accurately. This outcome suggests that the current ANN architecture or parameter configuration might not sufficiently capture the complex relationships between the predictors and the target variable permeate flux.

4.3.2. Prediction of Protein in Retentate by ANN

An ANN model was employed to predict the “protein in retentate.” Figure 6 displays the batch size-epoch vs. score graph correlating with the prediction of protein in retentate. The model was trained using the combination of epoch and batch size values that corresponded to the highest peak value on the graph.

Upon training the ANN model, its accuracy was assessed and found to be 96.41 percent, indicating a high degree of alignment between predicted and actual values. The maximum absolute percentage error was 9.8 percent, with variability in absolute percentage error (APE) values ranging from 1.7 to 9.8 percent. Additionally, other metrics such as sum of squared errors (SSE) and root mean square error (RMSE) were calculated and obtained as 15.06 and 9.8 percent, respectively (Table 5). These metrics collectively suggest the suitability and effectiveness of the ANN model in accurately predicting the protein content in the retentate. Furthermore, the optimization process identified the ultrafiltration parameters associated with achieving the maximum protein content in the retentate as follows: membrane module: 10 kDa; transmembrane pressure: 117 kPa (17 PSI); volume concentration ratio: 3.5; diafiltration: 1; and flow rate: 65% of pump capacity. This optimized parameter configuration yielded an absolute percent error value of 2.81. Consequently, with these optimized process parameters, the predicted value of protein content in the retentate was calculated to be 80.49. These findings underscore the capability of the ANN model to accurately predict the protein content in the retentate and highlight the identified optimal parameters for maximizing protein retention during ultrafiltration processes.

4.3.3. Prediction of Protein Retention by ANN

An ANN model was employed to predict “protein retention” using batch size and epoch configurations, identified from the graph (Figure 7). The model was trained using these optimal values, subsequently tested on a separate testing dataset. The accuracy of the model was determined to be 99.61 percent, indicating an exceptional alignment between predicted and actual protein retention values. The maximum absolute percentage error was noted at 0.74 percent, with APE values varying between 0.07 and 0.74 percent across different predictions. Additional performance metrics, as outlined in Table 5, further corroborate the efficacy of the ANN model for predicting protein retention through ultrafiltration. Notably, the root mean square error (RMSE) and sum of squared error (SSE) values approaching zero signify a close fit between the observed and predicted values, emphasizing the model’s accuracy and reliability in predicting protein retention [37].

Moreover, the optimized ultrafiltration parameters associated with maximizing protein retention were identified as follows: membrane module: 10 kDa; transmembrane pressure: 117 kPa (17 PSI); volume concentration ratio: 3.5; diafiltration: 1; and flow rate: 65% of pump capacity. These optimized process parameters exhibited an absolute percent error value of 0.07. Consequently, using these parameters, the predicted value of protein retention was estimated to be 99 percent. These findings underscore the robustness of the ANN model in accurately predicting protein retention and highlight the identified optimal parameters for maximizing protein retention in ultrafiltration processes.

4.4. Comparison of Optimized Values from RSM and ANN

The RMSE, SSE, percent error, , and absolute average deviation values for both the response surface methodology (RSM) using Box-Behnken design (BBD) and artificial neural network (ANN) models highlight their suitability in predicting ultrafiltration process parameters for soy protein preparation. In Cheok et al.'s study in 2012[38], they utilized both RSM and ANN to optimize phenolic compound extraction from Garcinia hull. Their findings favoured ANN as a superior modelling technique for nonlinear data, especially based on performance parameters such as average absolute deviation. Table 6 compares the predicted optimal process parameter values for ultrafiltration, aiming to attain desired response values. This table showcases a comparative analysis between the predictions made by RSM and ANN methodologies.

In Lin et al.’s [39] study, they conducted a comparison between response surface methodology (RSM) employing Box-Behnken design (BBD) and artificial neural network (ANN) models to optimize the ultrafiltration process aimed at removing nickel ions from aqueous solutions. Their findings affirmed the suitability of both tools in predicting ultrafiltration process parameters. Interestingly, the ANN model exhibited higher prediction accuracy compared to RSM. This aligns with observations made by Chakraborty et al. in 2014 [40], where they similarly noted that neural models effectively handled the nonlinear behaviour inherent in the ultrafiltration process.

5. Conclusion

Optimization of the ultrafiltration process for soy protein preparation was carried out using both response surface methodology (Box-Behnken design) and artificial neural network (ANN) techniques. For RSM, a quadratic model was utilized to optimize the ultrafiltration process parameters. The generated polynomial model from the response surface methodology underwent validation at the numerically optimized point, achieving maximum desirability. The resulting optimum process parameters for ultrafiltration were obtained through the desirability function approach, yielding a desirability score of 0.767. However, the ANN regression model faced challenges in accurately identifying the best combinations of predictors to optimize the permeate flux. Despite this limitation, the ANN model demonstrated efficacy in predicting the optimal process conditions for ultrafiltration to maximize both the protein content in the retentate and protein retention. Notably, the ANN model achieved an accuracy of 96.41 percent for predicting the optimal conditions to maximize protein content in the retentate and 99.61 percent accuracy for predicting the conditions maximizing protein retention.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Additional Points

Code Availability. The source code will be made available on request.

Conflicts of Interest

The authors declare that they have no conflict of interest.