Stock Prediction Based on Optimized LSTM and GRU Models

Gao, Ya; Wang, Rong; Zhou, Enmin

doi:https://doi.org/10.1155/2021/4055281

Scientific Programming

On this page

Abstract Introduction Results Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Next-Generation Optimization Models and Algorithms in Cloud and Fog Computing

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 4055281 | https://doi.org/10.1155/2021/4055281

Stock Prediction Based on Optimized LSTM and GRU Models

Ya Gao,¹Rong Wang,²and Enmin Zhou³

Academic Editor: Punit Gupta

Received09 Aug 2021

Accepted11 Sept 2021

Published29 Sept 2021

Abstract

Stock market prediction has always been an important research topic in the financial field. In the past, inventors used traditional analysis methods such as K-line diagrams to predict stock trends, but with the progress of science and technology and the development of market economy, the price trend of a stock is disturbed by various factors. The traditional analysis method is far from being able to resolve the stock price fluctuations in the hidden important information. So, the prediction accuracy is greatly reduced. In this paper, we design a new model for optimizing stock forecasting. We incorporate a range of technical indicators, including investor sentiment indicators and financial data, and perform dimension reduction on the many influencing factors of the retrieved stock price using depth learning LASSO and PCA approaches. In addition, a comparison of the performances of LSTM and GRU for stock market forecasting under various parameters was performed. Our experiments show that (1) both LSTM and GRU models can predict stock prices efficiently, not one better than the other, and (2) for the two different dimension reduction methods, both the two neural models using LASSO reflect better prediction ability than the models using PCA.

1. Introduction

The financial market is quite volatile and experiences periods of contraction as well as expansion. The stock market, as a major financial market, is likewise highly volatile. The stock market has the characteristics of high return which has attracted the majority of investors and high risk which puts pressure on investors to sell out at the wrong time. In order to reduce unnecessary losses and obtain higher trading profits, the investors usually except to predict the stock price trend. As a result, stock market forecasting has been a major research topic in the financial area and attracts the attention of investors. In the stock market, the factors affecting the rise and fall of stock prices are complex and diverse. It includes not only the impact of economic factors such as price indicator, circulation indicator, activity degree, and economic uncertainty but also the impact of noneconomic factors such as traders’ expectations, traders’ psychological factors, and political environment. Therefore, the prediction of stock price has always been a challenging task.

According to the efficient market hypothesis [1], the stock price can be predicted according to the data of historical stocks. Furthermore, in recent years, since the increasing computing power and the decreasing data storage costs, especially the rise and development of innovative technologies such as big data, machine learning, reinforcement learning, and other optimization technologies, researchers have developed various models for predicting stock prices. Machine learning has been widely used in the capital market and plays an indispensable role in predicting future stock prices based on historical data. Traditional stock price forecasting models are mainly linear models, including autoregressive integrated moving average (ARIMA) model [2], multiple linear regression model, and exponential smoothing model [3, 4]. However, those (autoregressive integrated moving average,multiple linear regression model, and exponential smoothing model) linear models play an important role in promoting the progress and development of stock forecasting. Stock prices are typically noisy, fluctuating, and nonparametric, resulting in nonlinear and nonstationary characteristics in the stock market. The standard linear prediction model is unable to produce reliable stock predictions. With the development of deep learning methods, nonlinear neural networks are increasingly employed to predict the stock price for their higher accuracy.

The artificial neural network (ANN) includes MP neural network and back propagation (BP) neural network. However, the structure of ANN model is too single and there are some problems: (1) over fitting leads to the weak ability of the model generalization, (2) local extrema leads to the decline of the prediction ability of the model, and (3) the gradient disappears or explodes due to the excessive weight of neurons in the optimization process, resulting in the failure of prediction. Therefore, relevant scholars introduce deep neural networks (DNN), including convolutional neural network (CNN), recurrent neural network (RNN), long-term and short-term memory neural network (LSTM), and gated recurrent neural network (GRU), to improve the problems existing in the ANN model, so as to improve the accuracy and efficiency of prediction.

CNN is a type of neural network that has been increasingly popular in recent years. A one-dimensional CNN is a neural network that is designed to analyse image data efficiently. CNN can read and automatically extract the most significant features from the original input data for learning. This method feeds the network observed time series value as input and uses a multilayer network to predict the unobserved value. For example, Xu et al. [5] employed CNN to extract important stock features from stock market returns for forecasting stock market trends. Recurrent neural networks (RNN) such as long-term and short-term memory neural networks (LSTM) are another tool for predicting time series [6, 7]. LSTM accurately estimates time series data by using both the historical and the present stock data. In recent years, LSTM has been applied to stock market forecasting in different stock markets around the world. Chen et al. [8] used an LSTM model to predict China’s Shanghai and Shenzhen stock markets. Li et al. [9] introduced the stock indicator with investor sentiment based on the LSTM model to predict the CS1300 index value, and the research results showed that the model was better than the support vector machine method in prediction accuracy. However, this model does not reduce the dimension of stock indicator. Jiawei and Murata [10] attempted to identify the influencing factors of stock market trend prediction through the LSTM model, which used a preprocessing algorithm to reduce the dimension of stock features and a sentiment analyzer to present financial news for stock trend prediction. However, only one dimension reduction method is used, and there is no comparison with other methods. Hu [11] reduced the dimension of stock technical analysis indicators by PCA and LASSO methods before using the LSTM model to predict. The results demonstrated that compared with the LASSO-LSTM model, the PCA-LSTM model can significantly reduce data redundancy and enhance prediction accuracy. Although this work used different dimension reduction methods, it only used one model and did not compare with other models.

Cho et al. [12] reduced the LSTM structure and created GRU, a new deep learning architecture that integrates long-term and short-term memory. GRU solves the problem of gradient disappearance and explosion in classic recurrent neural networks (RNNs) when learning long-term reliance. GRU has also been widely used in recent stock forecasting. Shen et al. [13] compared and predicted the trading signals of stock indicator based on the GRU model and SVM. The results demonstrated that the prediction accuracy of the two GRU models is higher than that of other models. However, the emotion indicator was not included in this study. Rahman et al. [14] used the stock data of Yahoo Finance mobile phone and GRU model to predict the stock price. The emotional indicators were not considered in this study, nor were compared with the performance of other models [15].

In this paper, we integrate a variety of technical indicators, such as investor sentiment indicators and financial data based on the Shanghai Composite Index data. We use LASSO and PCA methods to perform dimension reduction on the multiple influencing factors of the extracted stock price. The LSTM and GRU models are then utilized in this paper to forecast the stock price. Most importantly, by comparing the accuracy and stability of the LASSO-LSTM, LASSO-GRU, PCA-LSTM, and PCA-GRU models, the optimal forecasting model may be recommended.

2. Methodology

2.1. LASSO

In empirical analysis, in order to minimize the model deviation due to the lack of important independent variables, we set multidimensional variables. The models need to find the set of independent variables with the strongest explanatory power to the dependent variables. That is, the models need to improve the interpretability and prediction accuracy through independent variable selection (indicator selection and field selection). Indicator selection is an extremely important problem in statistical modelling. LASSO is an estimation method that can simplify the indicator set. It is a compressed estimation. It gets a more refined model by constructing a penalty function, which makes it compress some coefficients and set some coefficients to zero. Therefore, it retains the advantage of subset contraction and is a biased estimation for dealing with complex collinear data.

LASSO’s basic idea is to minimize the sum of squares of residuals under the constraint that the sum of absolute values of regression coefficients is less than a constant, so as to produce some regression coefficients strictly equal to 0 and obtain an interpretable model. LASSO adds penalty term to the ordinary linear regression model, and the LASSO estimation of the ordinary linear model iswhich is equivalent towhere and are said to be in one-to-one correspondence and they are the adjustment coefficients.

Let , and when , some coefficients will be compressed to 0, so as to reduce the dimension of and the complexity of the model. Finally, the variable selection can be realized by controlling the adjustment coefficient through the .

2.2. PCA

Principal component analysis (PCA) is a dimension reduction statistical method. With the help of an orthogonal transformation, it transforms the original random vector whose components are related into a new random vector whose components are not related. This is expressed algebraically as transforming the covariance matrix of the original random vector into a diagonal matrix and geometrically as transforming the original coordinate system into a new orthogonal coordinate system. Then, the multidimensional variable system is reduced, so that it can be transformed into low-dimensional variable system with a high accuracy, and the low-dimensional system can be further transformed into one-dimensional system by constructing an appropriate value function.(1)Standardized collection of original indicator data -dimensional random vector and n samples , where . Then, we construct the sample array and carry out the following standardized transformation on the sample array elements: where , , and . Thus, the standardized matrix Z is obtained.(2)Find the correlation coefficient matrix for the standardized matrix Z as where (3)Solve the characteristic equation of sample correlation matrix by to get -characteristic roots, thus determining the principal component. Determine the value of according to , so that the utilization rate of information can reach more than 85%. For each , we solve the equation to obtain the unit eigenvector .(4)Convert the standardized indicator variable into the main component , where is called the first principal component, is called the second principal component, and so on. is called the principal component.(5)Evaluate principal components comprehensively. The final evaluation value is obtained by weighted sum of principal components, and the weight is the variance contribution rate of each principal component.

2.3. LSTM and GRU

LSTM is a special type of recurrent neural network (RNN). The RNN neural network model can recycle the weight parameters of neurons and can effectively employ past data information for prediction. However, RNN can only deal with certain short-term dependence and is prone to gradient explosion and gradient disappearance, that is, long-term dependence on historical data. In order to solve these problems, LSTM was proposed by Hochreiter and Schmidhuber [6] and then improved and promoted by Graves [16]. It has been widely used in a variety of challenges and has yielded impressive outcomes.

Compared with the RNN model, the LSTM model introduces a cell state and uses the input gate , forget gate , and output gate . The three gates are used to maintain and control information. At time t, is the input data, represents the current output, is the value from the input gate, is hyperbolic tangent function, is the sigmoid function, represents the matrix weight, and is the bias. The operation formula of LSTM is as follows.

Forget gate:

Input gate:

Output gate:

The LSTM model is especially popular in the field of financial forecasting because it effectively deals with the redundancy of relevant information in historical data.

GRU is one of the variants of RNN which is introduced by Cho et al. [12]. By introducing gating structure, it solves the problem that RNN is difficult to deal with long-distance information acquisition. Compared with LSTM, GRU is simplified and only update gate and reset gate are introduced. In GRU, the update (or input) gate decides how much input and previous output to be passed to the next cell and the reset gate is used to determine how much of the past information to forget. The current memory content ensures that only the relevant information needs to be passed to the next iteration, which is determined by the weight W. The main operations in GRU are governed by the following formulae.

Update gate:

Reset gate:

After resetting the gate and updating the gate, the candidate status value of GRU unit is and the final output status value is :

3. Experiment Settings and Results

3.1. Data Source and Indicator Selection

In this paper, the data of the Shanghai Composite Index (000001) from April 11, 2007, to August 3, 2021, are selected as the experimental data. The data comes from NetEase Finance and Economics website, with a total of 3,481 days. In order to evaluate the training effect of the model, we divide the experimental data into training set and test set, of which 80% are used as one training set to train the stock prediction model and the other 20% are used as test sets to verify the prediction effect of the model. In addition, we use Intel Core i9-9900K CPU with memory 64 GB to finish the experiments.

In the selection process of stock technical indicators, this paper considers the factors affecting the stock price as much as possible. Compared with other studies, this paper selects the open price, highest price, lowest price, trading volume, and other common technical indicators, such as OBV, KDJ, BIAS, RSI, CCI, and MFI, as well as other stock price judgment technical indicators and PSY indicators reflecting investors’ psychological mood. These indicators comprehensively reflect the information affecting stock price fluctuations and have the strong explanatory power for stock price fluctuations. The selected indicators are described in detail in Table 1.

3.2. Experimental Setup

Different superparameters have a significant impact on the prediction ability of LSTM and GRU models. Therefore, different superparameter data are set in the prediction to compare the prediction results. The number of neuron layers is set to 2 and 3, the number of neurons is set to 8, 16, and 32, the learning rate is usually set to 0.001, and the number of iterations is set to 1000. We can determine the most accurate prediction method by analyzing the prediction accuracy of the experimental results and the degree of fit of the trend between the predicted stock price and the historical stock price. The prediction accuracy is evaluated by mean square error function (MSE), root mean square error (RMSE), and mean absolute error (MAE) at different look-back values. The smaller the value of the three, the more accurate the forecast result is. The full specification of parameters used in these models is listed in Tables 2–5.

3.3. Experimental Results

The experimental results of stock prediction of four models are shown in Tables 2–5. Two different feature sets were obtained in this experiment. Set I is the data obtained from the LASSO dimension reduction method, and set II is the data obtained from the PCA dimension reduction method. These characteristic data are used to train LSTM and GRU models. In the experiment, different backtracking values were set. All parameter specifications used by the four models are shown in Tables 2–5. The results show that, through MAS, RMSE, and MAE indicators, both LSTM and GRU models can predict stock prices effectively, not one is more efficient than the other. However, for different dimension reduction methods, we find that all indicators (except the training time) show that the prediction results of the two neural network models using LASSO dimension reduction are mostly better than those using PCA dimension reduction data. In other words, under the same network model, the prediction performance of LASSO-LSTM model is better than PCA-LSTM and the prediction performance of LASSO-GRU is better than PCA-GRU.

4. Conclusion

This study innovatively integrates a variety of technical indicators such as investor sentiment indicators and financial data and carries out dimension reduction on the multiple influencing factors of the extracted stock price through LASSO and PCA analysis approaches. This work carries out a comparison on the performances of LSTM and GRU for stock market forecasting under the different parameters. Our experimental results show that (1) both LSTM and GRU models can be used to predict stock prices effectively and (2) for different dimension reduction methods, the prediction results of the two neural network models using LASSO dimension reduction are mostly better than those using PCA dimension reduction data.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

E. F. Fama, “Efficient capital markets: a review of theory and empirical work,” The Journal of Finance, vol. 25, no. 2, pp. 383–417, 1970.
View at: Publisher Site | Google Scholar
P.-F. Pai and C.-S. Lin, “A hybrid ARIMA and support vector machines model in stock price forecasting,” Omega, vol. 33, no. 6, pp. 497–505, 2005.
View at: Publisher Site | Google Scholar
X. B. Huang, C. F. Wang, Z. M. Fang, and C. L. Xiong, “Detection of Chinese stock information based on hidden Markov model,” Syst. Eng.-Theory Pract, vol. 32, no. 4, pp. 713–720, 2012.
View at: Google Scholar
J. J. Shi and T. Song, “Analysis of long-term fluctuation trends and influencing factors in stock market-based on spline-GARCH model,” J. Appl. Stat. Manag, vol. 34, no. 1, pp. 175–182, 2015.
View at: Google Scholar
B. Xu, D. Zhang, S. Zhang, H. Li, and H. Lin, “Stock market trend prediction using recurrent convolutional neural networks,” in Proceedings of CCF International Conference on Natural Language Processing and Chinese Computing, pp. 166–177, Hohhot, China, August 2018.
View at: Publisher Site | Google Scholar
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
View at: Publisher Site | Google Scholar
H. Hewamalage, C. Bergmeir, and K. Bandara, “Recurrent neural networks for time series forecasting: current status and future directions,” International Journal of Forecasting, vol. 37, no. 1, pp. 388–427, 2021.
View at: Publisher Site | Google Scholar
K. Chen, Y. Zhou, and F. Dai, “A LSTM-based method for stock returns prediction: a case study of China stock market,” in Proceedings of the 2015 IEEE international conference on big data (big data), pp. 2823-2824, IEEE, Santa Clara, CA, USA, October 2015.
View at: Publisher Site | Google Scholar
J. Jiahong Li, H. Junjie Wu, and J. Wu, “Sentiment-aware stock market prediction: a deep learning method,” in Proceedings of the 2017 international conference on service systems and service management, pp. 1–6, IEEE, Dalian, China, June 2017.
View at: Publisher Site | Google Scholar
X. Jiawei and T. Murata, “Stock market trend prediction with sentiment analysis based on LSTM neural network,” in Proceedings of the International Multi-Conference of Engineers and Computer Scientists (IMECS), pp. 13–15, Hong Kong, China, March 2019.
View at: Google Scholar
Y. W. Hu, “Stock forecast based on optimized LSSVM model,” Computer Science, vol. 48, no. S1, pp. 151–157, 2021.
View at: Google Scholar
K. Cho, B. Van Merriënboer, C. Gulcehre et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” 2014, arXiv preprint arXiv:1406.1078.
View at: Google Scholar
G. Shen, Q. Tan, H. Zhang, P. Zeng, and J. Xu, “Deep learning with gated recurrent unit networks for financial sequence predictions,” Procedia computer science, vol. 131, pp. 895–903, 2018.
View at: Publisher Site | Google Scholar
M. O. Rahman, M. S. Hossain, T. S. Junaid, M. S. A. Forhad, and M. K. Hossen, “Predicting prices of stock market using gated recurrent units (GRUs) neural networks,” Int. J. Comput. Sci. Netw. Secur, vol. 19, no. 1, pp. 213–222, 2019.
View at: Google Scholar
H. Wang, X.-M. Zhang, G. Tomiyoshi et al., “Association of serum levels of antibodies against MMP1, CBX1, and CBX5 with transient ischemic attack and cerebral infarction,” Oncotarget, vol. 9, no. 5, pp. 5600–5613, 2017.
View at: Publisher Site | Google Scholar
A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Springer, Berlin/Heidelberg, Germany, 2012.
View at: Publisher Site

Copyright

Copyright © 2021 Ya Gao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies