Abstract
This paper proposes a prediction method based on chaos theory and an improved empirical-modal-decomposition particle-swarm-optimization long short-term-memory (EMD-PSO-LSTM)-combined optimization process for passenger flow data with high nonlinearity and dynamic space-time dependence, using EMD to process the original passenger flow data and generate several eigenmodal functions (IMFs) and residuals with different characteristic scales. Based on the chaos theory, each component of the PSO algorithm was improved by introducing an inertia factor to facilitate the adjustment of its search capability to improve optimization. Each subsequence of the phase-space reconstruction was built into an improved PSO-LSTM prediction model, and the output of each prediction model was summed to determine the final output. Experimental studies were performed using data from the North Railway Station of Chengdu Rail Transit, and the results showed that the proposed model can generate better prediction results. The proposed model obtained root mean square error (RMSE) and mean absolute error (MAE) of 16.0908 and 11.3704, respectively. Compared with the LSTM, the improved PSO-LSTM, the improved EMD-PSO-LSTM, and the model proposed in this paper improved the RMSE values by 25.53%, 29.97%, and 58.76%, respectively, and the MAE values by 30.41%, 40.13%, and 63.08%, respectively, of the prediction results.
1. Introduction
At present, urban rail transport continues to develop well, as its speed, capacity, comfort, and safety help it become the main mode of transport for the urban public. Along with the gradually increasing intensity of passenger flow, short-time passenger flow prediction has become particularly important, as accurate traffic passenger flow prediction helps urban traffic managers better plan and manage their resources. Moreover, traffic short-time passenger flow prediction has become of strategic importance for the construction of urbanized intelligent transport systems to relieve traffic pressure, adjust operating times, and plan future construction. It is also the foundation of smart city ambitions and construction.
However, short-time passenger flow prediction can be a particularly challenging problem. Raw traffic-flow data are spatiotemporal data that simultaneously exhibit heterogeneity and correlation, as well as strong nonlinearity and chaos. In addition, most existing research captures relatively few traffic data attributes, resulting in unsatisfactory prediction results. Consequently, real-time accurate short-time passenger flow prediction is critical.
Short-term traffic forecasting models can be broadly classified into four categories—traditional statistical learning algorithms, machine learning models, deep learning algorithms, and combinatorial models. The tasks of statistical learning and machine learning algorithms are similar, which involve inferring model parameters and fitting and predicting data. However, the focus of the two differs—that is, statistical learning algorithms are more concerned with the confidence of predictions, whereas machine learning algorithms are more concerned with the predictive effects of the model. Statistical learning algorithms include autoregressive-integrated moving average (ARIMA) models [1], seasonal ARIMA models [2], and Kalman filter models [3]. The advantage of these models is that they are simple to operate; however, owing to the complexity of changes in actual passenger flow data, there can be a certain subjective factor in their establishment, which can be easily influenced by a priori assumptions that can be difficult to satisfy in practice, limiting their predictive performance.
Machine learning models include support vector machines (SVMs) [4], artificial neural networks [5, 6], and Bayesian networks [7], which can capture the nonlinear data features of short-time passenger flow using their own learning abilities. Leng et al. [8] established an improved neural network prediction model that was optimized using a genetic optimization algorithm that not only improved the convergence of its search capability but also its prediction accuracy. However, traditional machine learning models cannot effectively process high-dimensional data, and the complex variability of nonlinearities in time series traffic data can be difficult to capture. Moreover, their predictive performance depends on expert experience, and their generalization ability is weak. Consequently, many scholars have researched deep learning models to handle high-dimensional spatiotemporal traffic data.
Based on deep learning algorithms such as recurrent neural networks (RNNs) [9, 10] and long short-term memory (LSTM) neural networks [11], Huang et al. [12] processed information from traffic sequence data using long-and short-time neural networks and gated recurrent units in RNNs, and performed noise reduction of raw passenger flow data using wavelet transforms. Zhang et al. [13] used multigraph convolutional neural networks to explore the spatial features of traffic data. However, deep learning models are prone to overfitting or underfitting [14].
There is also a class of combinatorial models. Zhai et al.[15] proposed a hybrid traffic flow prediction method by combining the k-nearest neighbor and LSTM algorithms based on the spatiotemporal features of transportation data. Their experimental results showed that their proposed model improved by 12.59% on average compared with the comparison model. Gao et al. [16] proposed a new hierarchical hybrid model to forecast short-term passenger flows, with an average absolute error of approximately 10% in the forecasting results. Moreover, experiments showed that the prediction results of this combined model exhibited greater accuracy.
However, the collected traffic flow data can be disturbed by noise factors, reducing the predictive performance of the models. To minimize the impact of external factors on forecast accuracy, a prediction model that employs chaos theory can directly analyze the intrinsic regularity of traffic flow data through a priori cognition without establishing a subjective model. Consequently, considering the characteristics of nonlinearity and nonsmoothness in urban rail traffic time series data, the empirical mode decomposition (EMD) algorithm—which can be applied to nonlinear and nonsmooth signal processing methods—was examined for rail traffic short-time passenger flow predictions. The improved complete ensemble EMD with adaptive noise (CEEMDAN) method was proposed and used for the decomposition of highway data by decomposing the time series into different features, which could dramatically reduce the prediction error of the mainstream model. The improved CEEMDAN-fuzzy entropy (FE)-temporal convolutional network model was shown to exhibit high predictive accuracy and strong robustness when using the US101-S highway in California as the research object. Wang Xiao Quan et al. [17] proposed an SVM model for short-time traffic flow prediction, incorporating the principles of chaos theory to map traffic flows into a hyper-dimensional structure by performing phase-space reconstruction based on its nonlinear characteristics, and to calculate the embedding dimension parameter by using the maximum conditional entropy method with a time delay parameter obtained using mutual information techniques. Finally, the reconstructed subseries were used as inputs and predicted using genetic-algorithm optimized support vector regression. Numerical experiments showed that the proposed method exhibited excellent predictive accuracy. Lingling Wu et al. [18] proposed an empirical model decomposition and differential evolution algorithm to optimize the back propagation neural network for a short-time traffic flow prediction model. They used the EMD algorithm to decompose different modal components in the traffic timing data step-by-step, generating a series of eigenmodal functions (IMFs) and residuals at different scales to remove certain noise effects, thereby improving the accuracy of results.
The main contributions of this paper are as follows:(1)Given the nonlinear characteristics of traffic flow and the fact that particle swarm optimization (PSO) algorithms usually fall into local optimality, we combined the chaos theory and an improved EMD-PSO-LSTM model to design a short-time passenger flow forecasting method for urban rail transport, and applied it to the field of rail transit passenger flow forecasting for the first time.(2)We used the EMD algorithm to decompose the original time series data and perform phase-space reconstruction using chaos theory principles to reconstruct the useful aspects of the EMD to further explore the internal characteristics of traffic flow and improve prediction accuracy. An improved PSO-LSTM prediction model was then developed for each reconstructed subsequence. Owing to the tendency of the original model to fall into a local optimum, we improved it to increase its PSO-seeking search capability, and the predictions of each component were summed to produce the final output.(3)We conducted the experiments using a dataset comprising data from the North Railway Station of the Chengdu Municipal Railway to validate the effectiveness of the proposed model. The results showed that the proposed model performed better than existing methods.
2. Methods
2.1. EMD Algorithm
EMD can be used for analyzing both linear and smooth, and nonlinear and nonsmooth signals. The core of this approach is to gradually smoothen the signal and decompose the vibration modes in the signal into a finite number of components that tend to be smoothed based on different characteristic scales or trends. Moreover, in short-period passenger flow forecasting for rail transport, the transformation of nonlinear and nonsmooth passenger flow signals into linear and smooth signals better reflects their intrinsic physical meaning [18]. Compared with other signal-processing methods, EMD methods are more indirect, intuitive, and adaptive.
EMD divides the raw traffic flow signal into several empirical mode components (IMFs) based on its adaptive timing analysis, each of which contains local features of different feature scales in the previous traffic flow signal, and residuals (RES), which represent the mean or trend in the original traffic flow signal. Each IMF must satisfy two conditions at the same time—that is, the difference between the number of extremes and zeros in the domain of definition should not exceed 1, and the mean value of the upper and lower envelope functions should be 0 [18]. The process is as follows: Step 1: In the original traffic signal, all extreme points are calculated, and the upper and lower envelopes are fitted with the cubic spline interpolation function to calculate the mean value of the upper and lower envelopes. The mean of the original signal envelope is calculated as follows: Step 2: Subtracting the original sequence from to obtain a new sequence, as follows: where satisfies the IMF condition, so that the first IMF component is obtained. If is still unstable, the abovementioned process is repeated once with instead of until the resulting average envelope tends to 0, defining the component as . Step 3: The original sequence is subtracted from the first IMF component to obtain the first difference sequence with the high-frequency component and removed . The above processing of is used to obtain a second empirical modal component until it is no longer possible to disaggregate it, with the last one obtained being a residual . After decomposition, which represents the actual average trend of the primary series , the original sequence can be expressed as follows:
2.2. Chaos Theory
The study of the chaos theory began in 1980 with the phase-space reconstruction theory proposed by Packard et al. The theory states that the evolution of each component in a chaotic system is jointly dictated by the other individual components of these interactions, and that the variable contains information about the long-term evolution of all the variables in the system. The basic principle of phase-space reconstruction is the delayed embedding theorem proposed by Takens [19]. For chaotic time series, chaotic models can be built and predicted in the so-called phase space, wherein phase-space reconstruction based on the chaos theory is an essential component in the processing of a chaotic time series. There are two key index values in the phase-space reconstruction algorithm—the embedding dimension () and time delay ()—and in [19], the parameters of both the embedding dimension and time delay are only proved via theoretical studies, and no specific formula is given. In practical applications, the time delay and embedding dimension parameters should be calculated considering the actual situation because raw traffic flow time-series data are influenced by external variables.
In a chaotic system, a set of observations that vary with time can be obtained by examining them—noted as chaotic time series —and a set of -dimensional vectors can be constructed using the observations, as follows:where , denotes the time interval, denotes a sample point in the constructed phase space with components, and . If the parameters of are chosen appropriately, then can represent the state in the original system and dynamic characteristics of primary passenger flow data in the multidimensional phase space.
2.3. LSTM-Based Short-Time Passenger Flow Forecasting for Rail Transit
LSTM is particularly suitable for processing traffic data sequences with certain time intervals. In the LSTM neural network structure, each neuron comprises three gating units as a solution to the drawback of disappearing gradients owing to a long time series. The LSTM model comprises four main parts—memory cells, forgetting gate, input gate, and output gate [20]—as shown in Figure 1. The LSTM network structure is shown in Figure 2.


Here, the memory cell is the core component of the entire LSTM model used for storing the cell states of past information, and the output of the memory cell at moment t can be expressed as follows:where denotes the input at the present time, denotes the input gate, which refreshes the stored information in the cell state, and is the update to .
The forgetting gate is used to determine which part of the information in the cell state needs to be removed by fusing information from the preceding point in time and that on the time at hand. The output of the forgetting gate at moment t is obtained as follows:where denotes the output value of the forgetting gate, denotes the activation function, denotes the weighting matrix representing the forgetfulness gate, and denotes the bias term for the forgotten door.
Unlike the forgetting gate, the input gate decides which information can enter the unit state based on its threshold, and the candidate value vector is created in the layer to generate candidate memories, wherein new information passing through the screening is added to the unit state to replenish the lost attribute information. In addition, the input gate updates the information stored in the cell state, and its output at time t is calculated as follows:
The output gate result can be determined via three main components—the previous moment’s input information, the information stored after the cell state is updated, and the output information at the last moment. Thus, the output at moment t can be expressed as follows:where denotes the weight value, denotes the bias term, and denotes the hyperbolic slice employment factor.
2.4. Improved PSO-LSTM Algorithm for Rail Transit Short-Time Passenger Flow Prediction
PSO is an exploratory method and a classical swarm intelligence algorithm used for solving the optimal search problem. The principle underpinning this optimization method originated from the search learning of particle foraging behavioral approach, wherein each bird is abstractly viewed as a particle and used to represent a feasible solution [21]. By evaluating the degrees of superiority and inferiority of each particle through a fitness value, a series of random searches are performed, and the current optimal solution search is dynamically tracked by exchanging information with other particles, discovering information, and adaptively changing the direction of the next search by collective information sharing such that the group can determine the optimal destination location.
To overcome the shortcomings of traditional PSO algorithms, which can easily fall into optimal solutions, the particle swarm algorithm was improved by introducing an inertia factor, which can reasonably and effectively regulate the global search and partial search capabilities of the algorithm, such that it can change during the PSO search process based on the search function of the PSO algorithm [22]. When the particles update their velocity and position vectors with each repeated motion, the best value of these two results can be obtained by tracking the positions through which the particles and swarm pass. The specific method can be expressed as follows:where , denotes the total number of particles in the swarm, denotes the current position of the particle, denotes the velocity of the particle, and denote the learning factors, denotes the most optimum point for a mass to pass through, and denotes the optimal position experienced by the swarm as a whole [20].where denotes the weight at the beginning of the inertia factor, denotes the weight at the end of the inertia factor, denotes the maximum number of iterations after all iterations are completed, and denotes the number of iterations at the current moment [23].
The improved PSO solution can be obtained by constructing the LSTM short-time passenger flow prediction model and using a modified PSO optimization algorithm to discover the optimal parameters for the LSTM prediction model [24], as shown in Figure 3. The pseudocode of the improved PSO algorithm is shown in Figure 4.


The specific operational steps are as follows: Step 1: Initialize the particle swarm and set the relevant parameters, including the population size, random position, and velocity. Step 2: Determine the fitness value of each particle, as well as the optimal positions for the particle and particle population to pass. Step 3: Determine whether the particles satisfy the convergence condition, and if they do, output the result. If they do not satisfy the convergence condition, continue with the following steps. Step 4: The velocity vector is updated with the optimum positions passed by the particles and particle swarm, and the position vector of the particles is updated with the updated velocity vector, after which all optimal particles are updated. Step 5: Return to Step 3 until the convergence condition is met, before outputting the optimal result and number of iterations.
3. Design of Short-Time Passenger Flow Forecasting Algorithm for Rail Transit Based on Chaos Theory and Improved EMD-PSO-LSTM
The EMD algorithm can decompose the time series of rail transit short-time passenger flow into IMF components of different frequencies based on their intrinsic characteristics, which describe the local characteristics of the original series more clearly. An improved PSO-LSTM prediction model can then be separately built for each subsequence, before adding the predicted values of each subsequence to obtain the final output [25]. The model construction process is shown in Figure 5. The pseudocode of the improved EMD-PSO-LSTM passenger flow forecasting algorithm based on the chaos theory is shown in Figure 6. Step 1: By performing EMD on the rail traffic short-time passenger flow data, several IMFs and RES terms can be obtained. Step 2: The decomposed components are screened, which are then reconstructed in the phase space. Step 3: An improved PSO-LSTM prediction model is developed for each component after phase-space reconstruction, and the improved PSO algorithm is then used to find the optimal parameters and train the LSTM model. Step 4: The predicted values of each component are superimposed and fitted to obtain the final prediction results of the model. Step 5: The final prediction results are output [26].


4. Results: Case Studies
4.1. AFC Data and Processing
The operational data of the North Railway Station of Chengdu Metro Line 1—that is, its incoming passenger flow data from January 4–13, 2020—were selected for this study. In the Chengdu rail transit system, the AFC platform recorded the entrance and exit information of each passenger using smart card data from the automatic ticketing system at each metro station. Inbound traffic data at the metro stations were obtained between 5 : 00 and 00 : 55 the following day, with a data collection interval of 5 min. The data contained a total of 2,260 time series, each of which included the start time, end time, input flow, and output flow. The Chengdu Metro Rail Transit North Station road network map is shown in Figure 7.

A total of 2260 time series of data were input in the experiments of this paper. The input data are inbound passenger flow in person/5 min. The output data are the inbound passenger flow in person/5 min. The simulation environment used to test the predictive performance of the model in this study was MATLAB 2019a. Generally, the larger the training set, the more accurate the prediction results. Therefore, to take full advantage of the data, the first 90% of the original traffic data (that of eight days from January 4–11) were used in the training set, and the remaining 10% (January 12) were used in the test set.
First, anomalies and missing data from the original data were processed, wherein the anomalous data were considered as missing data. Lagrangian interpolation methods were used to process the missing data, wherein four neighboring data before and after the missing datum are selected for interpolation to ensure the reliability of the interpolated data.
The data was then normalized using the min-max method as follows:where and are the minimum and maximum values of the traffic flow, respectively, and and are the traffic flow data before and after normalization, respectively.
4.2. Prediction Results of the LSTM Model
The LSTM rail traffic short-time passenger flow prediction model was established, the results of which are shown in Figure 8.

4.3. Prediction Results of the Improved PSO-LSTM Model
The prediction results show that the prediction performance of the LSTM model is poor. Consequently, an improved PSO-LSTM model was introduced, the prediction results of which are shown in Figure 9. By running the model through repeated iterations, its prediction results were found to be optimal with an optimal number of hidden nodes of 167, optimal learning rate of 0.0310, and optimal number of iterations of 30.

4.4. Prediction Results Based on Chaos Theory and Improved EMD-PSO-LSTM Model
4.4.1. EMD of Traffic Flow Change Series
Because the original rail traffic flow data have characteristics of nonlinearity and nonsmoothness [27], the noise in it will have some influence on the prediction results, resulting in inaccurate prediction results. The noise in the time series data can be mitigated through the EMD algorithm, thus improving the predictive power of the model. Based on the EMD algorithm, the EMD of the rail traffic flow variation series can be divided into nine empirical modal components and one residual component. The EMD results are shown in Figure 10.

It can be observed from the figure that the IMF1, IMF2, and IMF3 empirical modal components have higher frequencies and are high-frequency components of the original rail traffic passenger flow data. IMF4, IMF5, and IMF6 empirical modal components have more obvious periodicity and are low-frequency components of the original rail traffic passenger flow data. The residuals are the overall trend of the time series data and are the trend components of the original rail traffic passenger flow data. The EMD of the rail traffic short-time passenger flow time series provides a clearer understanding of the passenger traffic flow data fluctuation and overall trend.
4.4.2. Phase Space Reconstruction Based on Chaos Theory
The EMD of the rail traffic flow change sequence can determine the traffic flow fluctuations more accurately. However, because the three components—that is, IMF7, IMF8, and IMF9—do not show the intrinsic properties of the data, the model only selects the remaining seven components, reconstructing them in phase space by finding the time delay using the mutual information method [28] and the embedding dimension using the Cao method. If phase-space reconstruction parameters are carefully selected, the reconstructed phase space can describe the states in the original system, and the multidimensional phase space can show the dynamic characteristics of the traffic flow change sequence [29]. The remaining seven component phase-space reconstruction parameters are listed in Table 1.
After phase-space reconstruction, the improved PSO algorithm for the components was used to determine the optimal number of hidden nodes, learning rate, and number of iterations of the LSTM prediction model. The optimal parameters of each component are listed in Table 2.
4.4.3. Prediction Results Based on Chaos Theory and Improved EMD-PSO-LSTM Model
Owing to the nonlinear and nonsmooth characteristics of the original rail traffic passenger flow data, the noise in its time series data has a certain influence on the prediction results, resulting in inaccurate predictions. Consequently, using the improved PSO-LSTM model, the traffic flow variation sequence could be empirically decomposed. Based on the chaos theory, the decomposed subsequence could be reconstructed in the phase space and an improved EMD-PSO-LSTM-combined optimization model was constructed. The PSO search and prediction results for each component are shown in Figures 11 more specifically, the superimposed predictions are shown in Figures 11(a)–11(g), which show the plots of the PSO search results for each component, whereas Figures 11(h)–11(n) show the predicted results for each component. Figure 12 shows the predictions for each component overlay.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(m)

(n)

4.4.4. Prediction Results of the Improved EMD-PSO-LSTM Model
To further validate the prediction effect of the combined optimization model based on the chaos theory and the improved EMD-PSO-LSTM model, a set of comparison models were added—that is, phase-space reconstruction of the components without considering the chaos theory. The components obtained after EMD were screened, and the remaining seven components were selected to build the improved PSO-LSTM prediction model. The optimal number of hidden nodes, learning rate, and number of iterations of the LSTM prediction model were determined using the improved PSO algorithm. The optimal parameters of each component are listed in Table 3. The prediction results shown in Figures 13(a)–13(g) show the plots of the PSO search results for each component. Figures 13(h)–13(n) show the predicted results for each component. Figure 14 shows the predictions for each component overlay.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(m)

(n)

4.5. Evaluation Indicators
To better compare the predictions between the LSTM model, improved PSO-LSTM model, improved EMD-PSO-LSTM model, and improved EMD-PSO-LSTM model are based on the chaos theory. The mean absolute error (MAE) and root mean square error (RMSE) and the coefficient of determination (R2) metrics were chosen to compare their errors [30]. Their formulas can be expressed as follows:where denotes the actual inbound traffic at time , denotes the forecast inbound traffic at time , indicates the average value of traffic volume, and denotes the total volume of inbound traffic in the traffic sequence.
To visually evaluate the prediction results of the four prediction models, the aforementioned errors were used to compare and analyze the strengths and weaknesses of the model predictions [17]. The evaluation indicator values of the prediction results are listed in Table 4, and the percentage improvements in the prediction results are listed in Table 5.
The comparative analysis of the error metrics shown in Tables 4 and 5 indicates that the improved EMD-PSO-LSTM prediction model based on the chaos theory exhibits higher prediction accuracy. In addition, the improved PSO-LSTM and improved EMD-PSO-LSTM models and the improved EMD-PSO-LSTM model based on the chaos theory improve the passenger flow prediction results by using the LSTM model prediction results as a benchmark. The percentage improvement of RMSE values is 25.53, 29.97, and 58.76%, respectively, and the percentage improvement of MAE values is 30.41, 40.13, and 63.08%, and the percentage improvement of R2 values is 13.36, 16.31, and 32.30%. These results indicate that the proposed model has great potential for short-time passenger flow forecasting applications in rail transit.
To further validate the experimental results, the EMD-PSO optimization algorithm based on the chaos theory was compared with combinations of deep learning-based deep belief networks (DBN) and gated recurrent unit (GRU) neural networks. Moreover, combinations of neural network prediction commonly used in the field of traffic flow prediction, including the model radial basis function neural network (RBF) and multilayer perceptron (MLP), were also compared. The experimental results and error results are presented in Figure 15 and Table 6, respectively.

The experimental results showed that the proposed EMD-PSO-LSTM model of rail transit short-time passenger flow prediction based on the chaos theory obtained better prediction accuracy than the deep belief network (DBN), gated recurrent unit (GRU) neural network, radial basis function (RBF) neural network, and multilayer perceptron (MLP) models.
5. Conclusions
The search performance was signficantly improved by improving the PSO algorithm. Moreover, by decomposing the time series using EMD and recombining them in a phase-space based on the chaos theory, the characteristic information in the traffic data series could be fully captured, thereby improving the accuracy of the prediction results. It was also found that the chaos theory could further explore the intrinsic characteristics of the time series data, which considerably improved the prediction results. Therefore, combining chaos theory with EMD and an improved PSO-LSTM model for optimization is an effective method for short-time passenger flow forecasting of rail traffic. However, the prediction of short-term traffic flows from the perspective of historical flows alone is somewhat one-dimensional and ignores the impact of data uncertainty. Future research must focus on exploring deep neural architectures to address data uncertainty and making additional improvements to model prediction capabilities. These include rough autoencoder (RAE), interval probability distribution learning(IPDL), and deep temporal dictionary learning (DTDL).
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.