Abstract
High-precision time sequence forecasting is a complicated cyber-physical system (CPS) task. Due to the diversity of data scales and types, the classic time-series prediction model meets the challenge to deliver accurate prediction results for many forms of time-series data. This work proposes a hybrid model with long short-term memory (LSTM) and embedded empirical mode decomposition (EEMD) based on the entropy fusion feature. First, we apply EEMD in entropy fusion feature long short-term memory (ELSTM) to lessen pattern confusion and edge effects in traditional empirical mode decomposition (EMD). The sequence is then divided into intrinsic mode functions (IMF) by using EEMD. Then, feature vectors are constructed between IMFs and their respective information entropy for feature merging. LSTM is used to build a full connection network for each entropy fusion feature IMF subsequence for prediction and each type of IMF subsequence as the feature dimension to obtain its prediction results. Finally, the output results of all IMF subsequences are reconstructed to obtain the final prediction result. Compared with the LSTM method, the performance of the proposed method has been improved 64.33% on the evaluation metric MAPE. The proposed model has also delivered the best prediction outcomes across four different time-series datasets. The experimental results conclusively show that the proposed method outperforms other models compared.
1. Introduction
1.1. Research Motivation and Problem Statement
1.1.1. Motivation
Time-series forecasting (TSF) is a recent dynamic technique for managing CPS, intelligent processing, financial analysis [1], and equipment fault diagnosis [2, 3]. Hrabia et al. [4] pointed out that if an effective analysis model and prediction of COVID-19 can be developed, the epidemic prevention policy can be adjusted accordingly. TSF will play a constructive role in predictive decision-making. In addition, for CPS, the CPS sensor network will collect different types of data. In order to organize system resources more logically, it is necessary to develop a reliable regression prediction analysis model. Huang et al. [5] proposed an EMD method to perform modal decomposition on the original complex signal data and, finally, obtain multiple intrinsic mode function (IMF) individuals with gentle characteristics. The EMD helps to decompose and extract the inherent information of data sequences; with the same purpose as neural networks in solving TSF problems, we consider using existing deep learning methods combined with EMD-derived methods for prediction to achieve more reliable prediction performance. Lu et al. [6] proposed a new hybrid approach Prophet-EEMD-LSTM based on decomposition methods and the prediction model. The fusion model can economically improve the prediction ability of the algorithm.
1.1.2. Problem Statement
Although the fusion model using the EMD method can improve the prediction ability of the model to a certain extent, the inherent problems of the EMD method, such as mode mixing and boundary effect, will affect the prediction ability of the model. Therefore, it is necessary to develop a model with better decomposition ability to extract the hidden features of time series [7]. Moreover, the mode decomposition method itself cannot achieve orthogonal decomposition in the time domain decomposition and there is incomplete decomposition. Therefore, there must be a correlation between the IMF, and how to make full use of this feature to further extract data features is also the focus of our consideration.
1.2. Research Challenge and Gaps
1.2.1. Challenges
In common application scenarios, the sequence characteristics are typical nonstrictly stable and nonlinear data types, with high discretization features [8]. EMD is usually used to analyze complex signals with nonstrict stationary characteristics and a high degree of nonlinear dispersion in traditional signal processing. Zhao et al. [9] used a hybrid process based on the EMD and LSTM neural network method. By using the original EMD methodology, the feature extraction ability was improved and the prediction error was decreased. However, there is mode aliasing in signal decomposition. This problem will be particularly prominent when the original signal contains more high-frequency components. This problem can be dealt with by the VMD-LSTM method proposed by Niu et al. [10]. The intrinsic mode function (IMF) in signal decomposition has boundary effects between their components, which will affect the prediction accuracy [11]. Ensemble empirical mode decomposition (EEMD) aims to prevent the mode overlap problem that results from the EMD method’s insufficient breakdown of data properties [12]. The method enhanced the extraction of periodic sequence characteristics and trends using the conventional EMD approach as a foundation. In an attempt to lessen aliasing across IMF components and EEMD-LSTM proposed superimposing Gaussian noise on the original signal [13, 14]. Based on the EEMD components [14], the data are divided into high-frequency and low-frequency sequences using the enhanced Bi-LSTM.
1.2.2. Research Gap
Existing research usually focuses on how to improve the bias of modal aliasing and boundary effects on network models through more optimized decomposition models. However, current mainstream modal decomposition fusion models ignored the correlation between IMF components obtained from pattern decomposition. At the same time, this kind of decomposition method cannot achieve complete decomposition (orthogonal decomposition). Moreover, in the framework of the current EMD-LSTM method, each IMF uses independent prediction methods to reconstruct. This step places too much emphasis on the characteristic results of the modal component itself, which will cause the prediction result to be skewed to the offset or special mode for incomplete modal decomposition. In general, the abovementioned model will inevitably have the defect of ignoring the interaction between IMFs. Furthermore, the hidden features between the time-series features of IMFs are also ignored in the construction of most fusion models.
1.2.3. Research Novelty
(1)A hybrid time-series prediction model with EEMD and LSTM based on entropy fusion feature is proposed, namely, EEMD-ELSTM. It consists of four parts: (1) EEMD, (2) entropy feature fusion, (3) LSTM network, and (4) prediction outcome reconstruction.(2)EEMD method can reduce mode aliasing and boundary influence, and the entropy feature fusion method calculates the entropy of the IMF components obtained, and then adjusts the balance weight between the IMF components with high entropy and low entropy in the fully connected network by increasing the entropy sorting fusion method, thereby ensuring strong correlation between the dimensional components of the fully connected LSTM network and improving the reliability of LSTM prediction results.(3)The eigenvector frames between similar modal functions are processed by fully connected modal decomposition LSTM to improve the prediction accuracy of the final reconstruction data.1.2.4. Main Contributions
The main contributions of this work are summarized as follows:(1)The EEMD algorithm can decompose hidden informative features from time-series data onto multiple characterized IMF components(2)The ELSTM phase of the algorithm can map the hidden characteristics of IMF components and pass on the same characteristics among components using entropy fusion, relying on the learning characteristics of the LSTM network(3)In the LSTM stage, the algorithm is able to map the hidden characteristics between IMF components through entropy fusion according to the learning characteristics of the LSTM network and transfer the commonality between the components(4)The proposed optimization algorithm can be applied to different datasets and has general applicability
2. Related Work
The current research in the time-series prediction field can be divided into three main categories [15, 16].
2.1. Statistical Model-Based Method
Prajapati and Kanojia [17] examined the index that has the largest influence on the fluctuation of COVID-19 for India by using autoregressive integrated moving average (ARIMA) and autoregressive (AR) models. Behzadi et al. [18] proposed a general information theory framework based on the generalized linear model (GLM), which was applied to the causal inference of heterogeneous datasets and verified on celestial data. Kap et al. [19] proposed the additive noise model (ANM) on noise level for time-series analysis, which confirmed the effect and significance of the noise overlaid on the extraction sequence in the time-series prediction model. Hanapi et al. [20] proposed the fuzzy sliding window autoregressive conditional heteroskedasticity model for time-series prediction and applied it to aerial data. Studies based on time-series decomposition [21] that used the Prophet model can handle time-series data with default values and predict itself. However, time-series data often contain many components with cyclical characteristics and nonsignificant trends, which are not well taken into account by the aforementioned models.
2.2. Deep Learning-Based Method
Most of the time-series data streams collected by sensor networks in CPS are highly discrete and nonstationary, which makes accurate trend forecasting difficult to achieve. The spatiotemporal properties of the time-series data are not simply relative to real-time data about the current temporal nodes but are based on the traditional Markovian nature [22]. It will result in the loss of data feature information. Huang et al. [23] elaborated on related methods based on artificial neural networks (ANN), such as recursive neural networks (RNN) in the field of deep learning. Depending on the connections between hidden units in the network, it will allow the network to inherit the information characteristics of related events. Zeng et al. [24] proposed a linear combination model based on the Prophet model and LSTM, which provided a new idea for regression analysis in constructing the fusion model of diversified LSTM networks. Li et al. [25] used wavelet transform to process the signal for the regression analysis of user traffic. Zhang et al. [26] decomposed several IMF sequences through variational modal decomposition (VMD) in the study of nonstationary wind speed series. They conducted secondary denoising for each IMF and then trained in the improved residual neural network (PCA-BP-RBF), which significantly increased the prediction performance. Although the VMD method can effectively eliminate the problem of modal aliasing, there are significant differences between the decomposition results obtained by redefining the IMF function and EMD and its variants. In addition, before performing VMD decomposition on the sequence, the number of modal components K needs to be set first. However, this step needs to be based on a great deal of prior experience, and in some applications, specifying the number of IMFs is an advantage. However, for certain scenarios where the number of hidden modes of the signal is not predicted, calculating and setting this optimal K-value will actually increase the time cost of modeling. Moreover, this kind of model has a high degree of complexity and is even unsuitable for systems with low time delay requirements.
2.3. Machine Learning-Based Method
The least squares support vector machine (LSSVM) was created by Gong et al. [27]. To optimize parameters on the basis of the particle swarm optimization (PSO) algorithm, Montesinos López et al. [28] addressed the issue of high algorithm time complexity by using the sequential minimal optimization (SMO) algorithm to optimize objectives following the quadratic programming problem of the support vector machine (SVM) and support vector regression (SVR). Pekel [29] used decision tree regression (DTR) to predict soil moisture. Jumin et al. [30] conducted the regression study of solar radiation using the boosted decision tree regression (BDTR) model. Qiu et al. [31] optimized the extreme gradient boosting (XGBoost) model and improved the prediction accuracy compared with other XGBoost-based models. This kind of method does not require a massive amount of data, but their model structures are complex.
For the three methods described above, we draw the following conclusions. First, machine learning and statistical learning have a large overlap, or machine learning is based on statistical learning. Second, statistical learning is theory-driven, making assumptions about data distribution, explaining cause and effect with strong mathematical theory support, and focusing on parameter inference. Third, machine learning is data-driven, relying on big data scale to predict the future, weakening the convergence problem, and focusing on model prediction. Fourth, deep learning is a subfield of machine learning; that is, feature extraction relies more on hidden layer models, weak explanatory properties, and tends to black boxes.
From the three main research categories mentioned above, it is not difficult to draw the following conclusions. Machine learning and deep learning methods have shown great advantages in related research. Moreover, the research of fusion algorithms based on modal decomposition optimization and neural networks also reflects its excellent reliability. Therefore, in this paper, we try to combine the related methods of modal decomposition and LSTM methods.
The EMD decomposition of the correlation prediction model of the sequence data with time characteristics has been developed in recent years [32–34]. Zhang et al.’s [35] derived model of EMD decomposition improved the gating cycle unit (GRU) and combined EMD with the regression prediction model of PM2.5. Ali et al. [36] pointed out that they a proposed new version of EMD based on the Akima spline interpolation technique and LSTM network, and at the same time, this method enhances the effectiveness of the improved model. Dedovic et al. [37] used EMD and ARIMA to predict air quality. Liu et al. [38] stated that combining the EMD model and ANN can improve the prediction effect. However, most works are based on single type data and there are still modal aliasing and boundary effects. In addition, the relevant algorithms did not fully use the neural network characteristics to mine data features. Therefore, this paper proposes EEMD-ELSTM to ameliorate the prediction accuracy and universality of the model.
3. Method
The global EEMD-ELSTM framework is described as follows:(1)EEMD using Gaussian white noise is used to deconstruct the source series data for the period(2)In the decomposed IMF, the entropy value is analyzed and classified according to the entropy value(3)The LSTM network is then used to conduct full connection prediction for IMF subcolumns of each category to obtain prediction results(4)All IMF subcolumns are then combined and reconstructed to obtain the final forecast result
The abovementioned process is summarized in Algorithm 1.
|
3.1. LSTM and EMD Relational Model
3.1.1. LSTM
Lobo Neto et al. [39] pointed out that LSTM further optimizes the network performance of RNN and impressive results have been obtained on various time-series problems. Generally, an LSTM network is composed of storage units. Three special cell structures are responsible for updating network data: output gate, input gate, and forgetting gate. Its structure is displayed in Figure 1, where indicates Hadamard product and + is matrix addition.

Inputs for LSTM typically consist of state parameter, state parameter, and state parameter. The three outputs are the state parameter, the state parameter, and the state parameter, where the state parameter represents the current round’s input, the state parameter represents the round’s state quantity output, and the state parameter represents the round’s global information carrier. Then, the state parameter represents the current round’s output, the state parameter represents the round’s status output, and the state parameter represents the round’s global information carrier.
The LSTM calculating process is as follows. First, the four states for splicing training are obtained by utilizing the current input’s state parameter of the LSTM and the state parameter got past from the preceding state parameter, and these four states’ parameters are expressed as . After being multiplied by the weight matrix, the splicing vector is turned into a single parameter through a sigmoid activation function as a gating state. turns the output into such a number between both −1 and 1 through using the tanh activation function.
Second is the forgetting stage, that is, the forgetting gate. Selectively forgetting the information from the preceding node is the primary task of this step. Depending on the abovementioned state parameter , f is expressed as forget, which is used as the control function of the forgetting gate to decide whether the data in the previous state needs to be forgotten and output a value between 0 and 1, where 1 means to be completely retained and 0 means to be completely forgotten. The function is expressed as follows:
The third stage is selecting the memory stage and inputting the gate. This stage will selectively memorize the input time-series data. It is mainly used to selectively memorize the input . The selected gating signal is represented by the state parameter , the current input is represented by the state parameter , and i is information. The next stage of transmission can be obtained by adding the abovementioned two steps.
The final stage is the output stage and this stage will decide whether the data will be output as the current state. The decision is mainly made through the state parameter . In addition, the obtained in the previous stage is scaled through the tanh activation function mentioned above. Compared with the traditional RNN, output is also obtained through the change of .
On the basis of the traditional RNN chain structure, LSTM uses a special gate structure memory unit to replace the originally hidden nodes, which enhances the overall network’s ability to retain time-series data information and extends the network’s long-term memory ability.
3.1.2. EMD
It disintegrates the sequence data into multiple subcolumns according to the characteristics of the time scale itself and does not need other basis functions at the same time. In the structure of the decomposition algorithm, the main part is empirical decomposition.
Usually for a data sequence with temporal characteristics, , EMD can decompose multiple subcolumns of IMF and a residual component, stack staggers through multiple IMF subcolumns, and reconstruct the original sequence as
Among them, is able to be such stagger component, displaying the general current of such sequence resources.
The EMD method is used with the aim of extracting, from the raw signal, the high- and low-frequency sequences of such signal, as well as the various scale components, and arranging them in the plain sequence of frequency from high to low, so as to obtain the plain sequencer of IMF.
The main problems of the EMD algorithm mentioned in Section 2 can be described as follows [40–42]:(1)Boundary effects: the endpoints of the entire time-series data are usually not extreme points. Therefore, the constructed envelope function diverges at the beginning and at the end of the sequence due to the uneven distribution of the endpoints. The deviation caused by this phenomenon will continue to be superimposed in the decomposition course, which will eventually interfere with the decomposition reliability.(2)Mode aliasing: IMF after overall sequence data decomposition has incomplete decomposition and components of different scales and frequencies are mixed in a subcolumn. When the abovementioned phenomena occur in multiple subcolumns at the same time, the EMD algorithm will lose its physical meaning.
3.1.3. EEMD
To address the abovementioned issues, we use superimposed noise to extend the EEMD of the sequence. The EEMD used in this paper superimposes Gaussian white noise. These are the precise processes of breakdown.
Given the input time-series data , the Gaussian white noise is loaded with an additive mean of 0 into the sequence .where represents the amount of Gaussian white noise added. Comparisons between EMD and EEMD based on the UCI power network dataset are shown in Figure 2. The EEMD method extends the extremes of the original EMD process during decomposition to alleviate the extreme trailing effect and mode overlap at the endpoints.

(a)

(b)
In addition, for the IMF subcolumns that can be obtained by the decomposition of the abovementioned two methods, it is obvious that the EEMD method can obtain more subcolumns. At the same time, the IMF subsequences obtained by the modal decomposition algorithm are generally different and random.(a)EEMD model(b)EMD model
The EEMD method extends the extremes of the original EMD process during decomposition to alleviate the extreme trailing effect and mode overlap at the endpoints. It can be seen from Figure 2 that the boundary effect and modal aliasing problems that arise during the decomposition of the EMD algorithm are effectively improved in the EEMD model after superimposing Gaussian white noise.
The abovementioned steps are then repeated until the entire time-series data are decomposed times. Each time, a new Gaussian white noise is added to the series, and finally, the entire subcolumn can be obtained.
Integrated averaging on is performed as mentioned in the above steps to obtain of the overall sequence .
3.2. EEMD-ELSTM Algorithm
By integrating the EEMD and ELSTM models, the EEMD-ELSTM model can be created. Overall, EEMD-ELSTM can be divided into three steps: (1) EEMD data decomposition, (2) entropy feature fusion, and (3) combining the LSTM prediction results of each IMF subcolumn. The EEMD-ELSTM model algorithm flow structure presented in this paper is depicted in Figure 3.

In Section 3.1, to break down the data series, we employ the EEMD approach, = {}, and gain the = , . Then, we use the ELSTM algorithm to anticipate each , , providing each IMF series’ forecast outcomes.
After the calculation iteration, the component of each subsequence is obtained. We then perform entropy analysis on the abovementioned subsequence and take the two highest entropy score subsequences of subsequence as the feature dimension, and further construct the feature vector of the ELSTM model. The specific implementation steps of the entropy value feature fusion model defined in this section are as follows: first, calculate the entropy value of each IMF component and then screen the two maximum entropy values after obtaining the entropy value score. Second, for the model of this part, based on modal decomposition, we will get a residual under the ideal decomposition condition since a good residual is usually close to a linear function of one variable. At the same time, based on entropy theory. We can know that under the ideal condition of decomposition, the residual will be a maximum entropy sequence. However, in order to fully exploit other hidden data features after decomposition, we further select an IMF component with the second highest score as an auxiliary dimension. Third, the specific step of constructing the feature vector is to add two high entropy components as the feature vector for the LSTM sequence structure with three inputs and one output for the low entropy sequence training, while when constructing the LSTM sequence structure for the high entropy components, all IMF components are calculated as their auxiliary dimensions. For the low entropy sequence, the feature trend for the high entropy sequence is increased, while the high entropy sequence acquires the high-frequency features of the rest of the sequences. So as to achieve the model optimization, we ultimately expect and improve the robustness of information physical system modeling.
The value of the final predicted sequence data can be expressed as
The prediction formula for series is as described above so that the abovementioned L can be regarded as the prediction value of IMF of each subcolumn. The amount of import data determines the algorithm complexity of the traditional EMD model, which is . Our EEMD model performs multiple iterations and cycles on the original EMD model with Gaussian white noise added and subtracted.
As a result, the EEMD method has an complexity, in which k seems to be the number of cycles. In the entropy fusion model, calculating the entropy once and dividing the threshold according to the entropy value is approximate to the K-means classification algorithm. Consequently, this decomposition algorithm complicacy is defined as . Since such complexity that belongs to the LSTM depends on the input size n and the hidden size m, its complicacy is approximately , and each prediction of the ELSTM network will increase by dimensions; thus, ELSTM has a complexity of . When it is reduced to , where and are equal levels, the complicacy of the method is .
4. Experiment
4.1. Experimental Settings
The datasets used in this work are provided by the University of California. The experimental results show that the proposed approach overcomes the drawbacks of the conventional EEMD-LSTM method and further improves its accuracy compared with the state-of-the-art (SOTA) algorithms. Four different types of datasets of Tetouan and Morocco from the UCI database (2017) are used in this work: (1) the power grid power consumption, (2) the temperature, (3) the wind speed, and (4) the humidity. The datasets are listed in Table 1. The schematic design is shown in Figure 4. The dataset that is used in this comparative study is related to power distribution networks of Tetouan city which is located in north Morocco. The historical data used have been taken from the Supervisory Control and Data Acquisition System (SCADA) every 10 minutes for the period between 2017-01-01 and 2017-12-31. We used 1000 sample sequences from this dataset in our study.

It can be seen from Figure 4 that the characteristics of nonstrict stability and nonlinearity are in line with the problem we expect to solve. In order to better visualize the distribution of the aforementioned dataset we used, we also conducted box plot analysis on the data samples. The results are shown in Figure 5, and it can be seen from the box plot analysis results that there are great nonperiodic and discrete characteristics in the wind speed data in the abovementioned data distribution. In summary, the temporal characteristics of the four research objects are consistent with the problems that our proposed research model expects to solve in cyber-physical systems.

From Figure 6, we know that, from the decomposition results, the power consumption data have the most typical periodic characteristics, followed by local periodic characteristics of wind speed and temperature. Wind speed data have strong discreteness and nonperiodicity, and there are outliers in the data sequence.

(a)

(b)

(c)

(d)
Based on the decomposition results of Figure 5, we perform the entropy value calculation operation on the IMF component in Table 2, and combined with our previous description of the entropy fusion feature model, we can clearly know from the entropy theory that when there is a sequence with strong periodicity IMF, the entropy calculation result will inevitably be a low value, and the amount of hidden information contained in it itself is also low. In addition, through the entropy calculation score, we can also clarify the periodic characteristics of local cycles and problems between power consumption data and temperature and humidity, but the frequency bands are different, and the wind speed data has a modal gradient completely different from the above three, so the four data sets can represent the complex and changeable real environment in the cyber-physical system to a certain extent. This is one of the reasons why we chose the abovementioned four datasets as the model test objects.
To evaluate the reliability and dependability of the proposed model, the root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) functions, and Pearson correlations (CORR) are used as the indicators [43–45].where is the foresting outcome, is the authentic value resource, is the covariance, and and represent the variance.
4.2. Parameter Setting
The noise added to the experiment is assumed to be zero-mean Gaussian noise. In addition, an unlimited amount of data decompositions is allowed to pursue total decomposition. The EEMD-ELSTM model in this paper is validated by shuffle-split cross-validation and all the result parameters are the optimal value of the result of 10-fold validation.
In our comparative experiments and related ablation validation experiments, we set the following parameters.
The batch scale is set to 25, the epoch scale is set to 50, the validation split is 0.1, and the loss function is a mean square error (MSE), and the optimizer chose the Adam and Dense value as 1. There are 40 neurons in the LSTM and ELSTM variable parameter settings. The SVR parameters are C = 100, degree 3, gamma 1, and the kernel is . The ARIMA parameters are = 2, = 2, and = 0. For the Prophet model and the XGBoost model, we also give their hyperparameter definitions in Table 3. The data are divided into two halves for training (fifty percent) and testing (fifty percent). All experiments are made in the Anaconda with TensorFlow 2.1 platform.
4.3. Ablation Experiment
The ablation experiments are compared and analyzed on the dataset power consumption for LSTM, ELSTM, EEMD-LSTM [46], and EEMD-ELSTM algorithms. The results are listed in Table 4.
The results show that the entropy fusion model and EEMD approach have enhanced the LSTM computation optimization. The prediction precision of the LSTM model can be improved by introducing the EEMD-ELSTM algorithm. In addition, it can be known from the indicated parameters in Table 4 that the EEMD-ELSTM method has the most optimized results among the abovementioned methods, and the ELSTM using the entropy fusion method is also optimized to a certain extent.
4.4. Results and Discussion
From the experimental validation results in Figure 7, the prediction results show that the EEMD-ELSTM model gives the regression closer to the truth data than the LSTM model. Combined with the comparison of the iteration results in Table 4, it is claimed that the proposed approach outperforms the LSTM model in terms of prediction accuracy.

Figure 8 shows the comparison with the statistical method ARIMA, machine learning method SVM, deep learning method LSTM, Prophet, and decision tree approach XGBoost models.

Among the five models, it can be concluded that the prediction results of EEMD-ELSTM are the closest to the truth. The prediction error of the EEMD-ELSTM model in Table 5 is also at the smallest level. Six models are compared in terms of RMSE, MAE, and MAPE in Table 5.
In order to further discuss the influence of LSTM network parameters on the model results, we adjust the Batch size and the number of neurons to verify the effectiveness of the model, and in order to prove that the baseline model LSTM has certain advantages over the traditional ARIMA model, we add the following experiments, which are performed when the Batch size = 5 and the number of neurons is 50. In addition, we have added a set of SOTA from 2023 as a comparison model of LSTM-TCN [47].
From Figure 9 and Table 6, through the comparison of indicators, it is not difficult to find that the LSTM-TCN model has certain advantages over XGBoost in the performance of periodically significant time-series data prediction ability, and after parameter adjustment, the LSTM model has certain advantages over the traditional ARIMA model, but it can also be seen that the baseline model always has some disadvantages over the fusion model and ensemble algorithm in terms of the versatility and validity of different data.

It can be seen from Table 5 that the EEMD-ELSTM model outperforms other compared models on all datasets in terms of accuracy. Moreover, compared with other models, the EEMD-ELSTM method has good universality. The best evaluation values were obtained on four different datasets. At the same time, EEMD-ELSTM also has certain advantages over the evaluation value of the current advanced algorithm XGBoost model.
From the comparison of evaluation indicators in Tables 5 and 6, it can be seen that compared with the LSTM-TCN model, the proposed model has better applicability in the practical application of CPS. In addition, the results of the ARIMA model are not reliable in the presence of complex stationarity for the data.
The visual comparison is presented in histograms in Figure 10. The information is formed into a histogram to more effectively represent the prediction outcomes of the approaches in Tables 5 and 6. It can be seen from the histogram that the EEMD-ELSTM method gains the most optimized values of RMSE, MAE, and MAPE.

4.5. Friedman Validation and Post Hoc Nemenyi Test
We conducted the Friedman test [48] to investigate the advantages and disadvantages of the proposed model. Table 7 presents the Friedman validation result for the six models on four different datasets, and the result of Friedman test verification is stat = 22.691, = 0.001.
On the basis of the order value in Table 7, we calculate the critical difference by the post hoc Nemenyi algorithm and we get the result in Figure 11.

Each algorithm’s mean value is different, and the EEMD-ELSTM model has the lowest mean value. The mean values of other methods’ are the XGBoost decision tree approach, ARIMA method (statistical methods), LSTM model and LSTM-TCN model, Prophet model (deep learning method), and the SVR model (machine learning method) from low to high. The critical difference (CD) shows that the proposed algorithm has obvious differences compared with the existing SOTA. These Friedman and post hoc Nemenyi outcomes explain that the EEMD-ELSTM method is more practical and trustworthy than the generic forecasting model.
5. Conclusion
The entropy fusion feature-based hybrid time-series prediction approach for EEMD and LSTM is proposed in this paper. First, a modal decomposition method EEMD based on Gaussian white noise is constructed. Then, the IMF subcolumn components of some input data are obtained by using this algorithm. The IMF subcolumn components generated by decomposition are analyzed by entropy. The component of the IMF subcolumn with the largest entropy value is extracted as the auxiliary dimension, the eigenvector of the ELSTM model is constructed, and the final prediction result is obtained by model prediction and reconstruction. In addition, four time-series stream datasets with varying data sizes and six SOTA prediction models are chosen for comparison in the experimental verification section. The experimental results showed that the proposed model outperforms others in reliability and validity. Moreover, based on the MAPE index, we calculated the optimization percentage of the results under each dataset and different parameter configurations by comparing the results of the baseline model LSTM obtained by the experiment and the results of our improved model, and finally, we improved the MAPE parameters by 66.43% compared with the baseline model.
The abovementioned experimental verification shows that the EEMD-ELSTM model has the following characteristics:(1)Validity: from the analysis of evaluation indices in Tables 5 and 6 and Figures 10, and 11, the optimization EEMD-ELSTM method clearly outperforms the generic nonfused network prediction model in forecasting validity.(2)Universality: from the evaluation index results and Friedman test results of the abovementioned comparison experiments, it can be clearly known that among the four different types of datasets, the EEMD-ELSTM algorithm always has a good prediction effect, has good universality, and can be applied to datasets of different scales.(3)Causality: the effectiveness of the optimization part and the improved strategy mechanism proposed in the article can be seen from the comparison of ablation experiments set in the article. Ablation research is the most direct way to understand the causality in the system and generate reliable predictions. The causality of each part of the model was proved by ablation.
6. Further Study
(1) Such time complicacy of the LSTM method is regarded as generally large, and it depends on the machine computing power, increasing network neurons, and iteration times to improve the operation effect. In future research, it is considered to further optimize the operation process of relevant machine learning models to improve the model effect or to replace the traditional neural network part of the current common hybrid models. (2) This article only aims at one-dimensional time-series data flow, which has certain limitations. Predictive studies with a wealth of multidimensional data are among the hot studies of the present time series, as a result, we are considering including multidimensional data decomposition prediction as the next topic for research.
Data Availability
The data used to support the findings of the study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest associated with this work or significant financial support that could have appeared to influence its outcome.
Acknowledgments
This work was partially supported by the Yunnan Fundamental Research Key Projects (202101AS070016), the Major Science and Technology Project of Yunnan Province (202302AD080002), the Open Fund of Yunnan Key Laboratory of Computer Technology Application (CB22144S073A), and the Yunnan Province “Xingdian Talents Support Plan” Industrial Innovation Talents Project (Yfgr [2019] no. 1096).