Abstract
This study aims to model and enhance the forecasting accuracy of Saudi Arabia stock exchange (Tadawul) data patterns using the daily stock price indices data with 2026 observations from October 2011 to December 2019. This study employs a nonlinear spectral model of maximum overlapping discrete wavelet transform (MODWT) with five mathematical functions, namely, Haar, Daubechies (Db), Least Square (LA-8), Best localization (BL14), and Coiflet (C6) in conjunction with adaptive network-based fuzzy inference system (ANFIS). We have selected oil price (Loil) and repo rate (Repo) as input values according to correlation, the Engle and Granger Causality test, and multiple regressions. The input variables in this study have been collected from Saudi Authority for Statistics and Saudi Central Bank. The output variable is obtained from Tadawul. The performance of the proposed model (MODWT-LA8-ANFIS) is evaluated in terms of mean error (ME), root mean square error (RMSE), and mean absolute percentage error (MAPE). Also, we have compared the MODWT-LA8-ANFIS model with traditional models, which are autoregressive integrated moving average (ARIMA) model and ANFIS model. The obtained results show that the performance of MODWT-LA8-ANFIS is better than that of the traditional models. Therefore, the proposed forecasting model is capable of decomposing in the stock markets.
1. Introduction
The stock price movements are evaluated by volatility in the stock exchange market. Volatility explains the action of stock exchange market. It reflects that if the price of a stock fluctuates a lot over time (high volatility) or if a stock price fluctuates slowly over time (low volatility). Volatility is measured as the standard deviation of stock price [1]. Stock market volatility is a metric that measures riskiness of stocks and is relevant to both market policy makers and practitioners, mainly in emerging markets [2]. Indeed, an effective quantitative approach is needed to model the volatility of stock market in order to protect against unexpected price changes. Previous studies have shown that volatility in the stock market is time-varying; thus, the movements in volatility are nonrandom. Therefore, a number of time-varying volatility models have been developed by financial econometricians and other practitioners [3–8].
Artificial neural networks (ANNs) are widely used to support applications across a various business and scientific disciplines [9–12]. There are many articles that employed ANN for the prediction of stock market. For example, the authors [13] also predicted price fluctuations using the Haar wavelet and Takagi–Sugeno–Kang (TSK) fuzzy rule-based system. The TSK fuzzy rule-based method is used to forecast stock prices using a number of technical indices. The model has successfully predicted stock price fluctuations in Taiwan Stock Exchange market with an accuracy of up to 99.1% according to simulation results [14]. The authors proposed a forecasting fusion model that combined wavelet as a data preparation tool, fuzzy logic, and neural network. The proposed model was trained on dataset that covers duration from 2005 to 2010. The results indicate that this hybrid model achieves better forecasting accuracy than either of the models used separately. Similarly, the authors in [15] presented fuzzy wavelet neural network (FWNN) for the prediction of stock prices. The daily stock prices for the last three years have been used as dataset of 1000 samples where 950 samples were used for training and the remaining 50 samples were used for testing. The simulation results demonstrated that the proposed FWNN system with differential evaluation (DE) training has achieved better performance compared with other models.
The ANFIS uses both fuzzy logic and ANN [16, 17]. The forward and the backward processes comprise the ANFIS learning algorithm. The forward process goes via the five layers as given in [17–19]. The ANFIS models are used successfully in many fields such as engineering, computer science, and chemistry. Moreover, many models are used successfully for predictions when combining with MODWT. Note that, MODWT is a mathematical model based on five functions, namely, Haar, Db, LA-8, BL14, and C6 [20]. The literature reveals that a number of research works have been published that used ANFIS. The authors in [21] presented an ANFIS approach for long-term prediction of electricity consumption. They introduced ANFIS and AR models to forecast the prediction of long-term natural electrical demand for some European countries. The authors in [22] applied the ANFIS model to the Yamadu Hydrological Station annual runoff forecast in China. The results show that the ANFIS model has better forecasting efficiency than the ANN model on the basis of relative percentage errors. The authors in [23] used ANFIS to forecast the future sales of an online shop. The sample size was 80 day’s sale of 200 products. The results show that the ANFIS model can partially improve the accuracy in time series prediction. The literature also reveals that few articles have focused on the wavelet with fuzzy logic. In [24], the weekly data have been taken from January 2012 to November 2014. They used the fuzzy wavelet model to forecast the exchange rate of IDR of USD. The fuzzy wavelet model is the combination of the fuzzy Mamdani model and Discrete Wavelet Transforms (DWTs). The authors in [25] used a fuzzy wavelet neural control scheme for the micro-electro-mechanical system (MEMS). A novel time series forecasting model based on fuzzy cognitive maps and empirical wavelet transformation is proposed in [26]. The performance of wavelet neural network (WNN) and ANFIS models was compared using small datasets by [27].
The stock market volatility has been affected by macroeconomic variables such as inflation rate, unemployment, interest rate, gross domestic product, and oil prices. The Repo plays an important role in macroeconomy. The Repo is the monetary policy interest rate as it is used by the central bank to lend money to the banks for short term. The impact of Repo on stock market is studied by [28]. Furthermore, the oil price refers to the close price of a barrel of crude oil. The effect of oil prices on stock markets is studied by [29]. Indeed, the financial covariates (Repo and Loil) are investigated as input variables in our study.
According to the literature review, no one has concentrated on MODWT for modeling and enhancing the prediction accuracy in Tadawul over the last decade. In terms of the study’s objectives, a variety of comparative studies have been conducted on the usage of various MODWT functions individually as well as in combination with other MODWT models over the last ten years in various methodologies. However, potential room exists for further investigation about comparative applications of all MODWT functions, which include Haar, Db, LA-8, BL14, and C6 in combination with fitting ANFIS model in terms of single particular context or financial market. In this connection, this study undertakes this work in relation to Tadawul since some researchers in the literature have only used one feature of MODWT. The current research aims at using MODWT functions to analyze fluctuations in Tadawul. The index refers to the average performance of firms listed in Saudi exchange market. Additionally, the causes of stock market volatility and modeling of variance behavior are also specified to represent the accuracy of expectations and the percentage of possible risks. Furthermore, by combining MODWT functions with the ANFIS model and using the statistical criterion such as MSE, RMSE, MAE, and MAPE, the forecasting accuracy is enhanced and the new forecasting model is proposed.
This study is organized as follows. Materials and methods are explained in Section 2. The research design and methodology are discussed in Section 3. The empirical results are analyzed in Section 4. The conclusions are drawn in Section 5.
2. Materials and Methods
2.1. Dataset
The dataset for closing prices is obtained from Saudi Arabia stock market (Tadawul), Saudi Authority for Statistics, and Saudi Central Bank [30, 31]. The day-to-day closing prices were gathered from August 2011 to December 2019. The size of observations is 2026 [20, 32]. Table 1 shows the descriptive statistic of dataset.
LSCS refers to the logarithm of standard deviation for closing stock prices, which can be expressed as , where is the closing stock price.
LSCS has a mean of 6.75 and a standard deviation of 0.6923. LSCS has a minimum value of 3.83 and a maximum value of 7.22. It should be noted that Repo has a mean of 0.70 and a standard deviation of 0.28 whereas Repo has a minimum value of 0.13 and a maximum value of 4.55. Loil has a mean of 4.30 and a standard deviation of 0.35 whereas Loil has a minimum value of 3.33 and a maximum value of 4.84.
2.2. Wavelet Transform Formula
Wavelet transform (WT) is a mathematical formulation for transforming the original time series data into a time-scale domain. WT is an appealing option for nonstationary data, especially stock exchange market data because of its inherent nature. WT can be categorized as continuous wavelet transforms (CWTs), DWT, and MODWT. Note that, these transforms demonstrate similar behavior in general. The key difference among DWT and MODWT is that DWT can be applied to a certain number of observations (the size of samples should be 2 raised to the power J) whereas MODWT can handle data of any size. In this study, we focus on MODWT due to its flexible behavior [32, 33].
Theoretically, WT is an extension of Fourier transform (FT) [34] that is the output on sine and cosine functions. WT should meet the following criterion [20]:where is a function of frequency f and known as the FT. WT is used in a variety of applications including signal processing and image analysis. It was developed to solve the FT issue, essentially when treating with time, space, and frequency. As shown in equations (2a) and (2b), the father wavelet represents the low-fluctuate (smooth data) components whereas the mother wavelet represents the high-fluctuate (detailed data) components, with in the J-level wavelet decomposition:where J defines the maximum scale supported by the number of data points and the two forms of wavelets, father and mother wavelets, and meets the following criteria as expressed by the following equations:
The general mathematical model is presented in the following equation:
In more detail, as expressed by the following equation,
In equations (6a) and (6b) and, where and are the smooth and detailed coefficients, respectively, the WT is used to measure the approximation coefficient. The detailed coefficients are used to measure the significant fluctuations of the original data, while the smooth coefficients contain the most significant features of the original data. In general, Haar, Db, LA-8, BL14, and C6 are common transform functions in WT [32]. The following are some of the key characteristics of these functions. Except for the Haar model, the WT functions are arbitrarily regular. WT functions, with the exception of the Haar model, do not have explicit expression. WT functions are applied to real numbers. WT functions support an arbitrary number of zero moments, orthogonal, compact, bio-orthogonal analysis, orthogonal analysis, continuous/discrete transformation, fast algorithm, and exact reconstruction.
2.3. ARIMA Model
The autoregressive moving average (ARMA) model is considered one of the most important mathematical models, which is widely used in time series analysis. A moving average (MA) and an autoregressive (AR) models are combined in the ARMA model. A time series denotes a white noise (WN) process, and denotes Gaussian process iff for all t, is iid . A time series , given by equation (7), follows the model of [20, 35, 36]:where and are used as nonnegative integers, is defined as the order of autoregressive part (AR), is used as an order of the first (MA) part, and is defined as the white noise (WN) process. is an extension of the ordinary ARMA model. is given by [20].where , , and are the orders of auto-regression, integration (differencing), and moving average, respectively. When , the ARIMA model is reduced to the ordinary ARMA model.
2.4. ANSIF Model
ANFIS utilizes both fuzzy logic and ANN [37] where the ANN learning algorithm is used for training. Its operations consist of forward and backward steps that collectively comprise the ANFIS learning algorithm. The forward step consists of five layers. The fuzzy inference system under consideration is supposed to have two inputs and one output to simplify the explanations. Note that the input x represents the variable oil price, the input y represents the variable Repo, and the output z represents the logarithm of standard deviation for closing stock prices (LSCS). A standard rule base of fuzzy if-then rules for a first order of the Sugeno fuzzy model can be expressed as follows: if is and is , then , where , , and are denoted as linear output parameters. Figure 1 depicts the ANFIS architecture, which has two inputs and one output. Layer 1. Every node i in this layer is a square node with a node function as given by the following equations: where and are denoted as inputs to node and and are defined as linguistic labels for the inputs. In other words, is the membership function of and . Typically, and are selected to be bell-shaped with maximum value of 1 and minimum value of 0, such as , where the set of parameter is and . These parameters are referred to as premise parameters in this layer. Indeed, using the Gaussian function as the shape of the membership function, the fuzzification process transforms crisp values into linguistic values. Layer 2. Each node in this layer is a circle node labeled that multiplies the incoming signals and sends out the product as expressed by the following equation: Each node output describes the firing strength of a rule. In this layer, the t-norm operator (the AND operator) is used by the inference stage. Layer 3. Each node in this layer is a circle node called N. The node measures the ratio of the rule firing strength to the sum of all rules firing strengths as given by the following equation: In short, the ratio of the strengths of the rules is calculated in this layer. Layer 4. Each node in this layer is a square node with a node function as expressed by the following equation: where is the output of layer 3 and are the parameter set. Parameters in this layer are referred to as consequent parameters. In short, the parameters for the consequent parts are measured in this layer. Layer 5. A circle node called is the single node in this layer that calculates the overall output as the summation of all incoming signals as given by the following equation: The backward step is a database estimation method consisting of the membership function parameters in the antecedent part and the linear equation coefficients in the consequent part. Since the Gaussian function is used as the membership function in this process, two parameters, namely, mean and variance of this function are optimized. The least squares method is used to perform the parameter learning in this step.

2.5. Performance Measures
We use a number of accuracy criteria including the mean absolute percentage error (MAPE), the mean absolute error (MAE), the mean error (ME), and the root mean squared error (RMSE) [38, 39] as follows:
The MAPE criterion, also referred to as mean absolute percentage deviation (MAPD), is a criterion of prediction accuracy for the forecasting method in statistics. It always expresses accuracy as a percentage and is determined by equation (14), where represents the actual value, represents the forecasted value, and n represents the sample size. In this equation, the absolute value is added for each forecasted point in time and divided by the number of fitted points. In addition, the MPE is defined by equation (15), the MAE is given by equation (16), and ME is expressed by equation (17). The root mean square deviation (RMSD), also known as the RMSE, is a widely used criterion of the estimator differences. It estimates the mean error produced by the model in predicting the outcome for an observation. It is determined by equation (18), where N denotes the number of observations.
3. Research Design and Methodology
The aim of this study is to propose a new model to forecast the closed price data from the Tadawul stock market, which covers the period from 2011 to 2019. The proposed model coupled the ANFIS model with MODWT-LA8. In this connection, we have employed five MODWT functions, namely, Haar, Db, LA-8, BL14, and C6. Note that, the statistical tests are used to evaluate the accuracy of the models. Moreover, the original data were transformed into a time-scale domain using MODWT. The different phases of the MODWT forecasting mechanism are depicted in Figure 2. It should be noted that the wavelet process is used repeatedly while the data pattern was fluctuating. The objective of preprocessing is to reduce the statistical error criteria such as RMSE in the data before and after transformation. In this way, the noise in the original data can be eliminated. Essentially, the adaptive noise in the training pattern can help to minimize the risk of overfitting in the training process. Thus, we used MODWT twice for the preprocessing of the training data in this study. Further, MODWT converts the data into two sets, namely, detail series and approximation series. Since the financial data fluctuates significantly, we have employed these two series due to the fact that they show good behavior on such data. This helps in anticipating the transformed data more precisely. The MODWT’s filtering effect is responsible for these two series’ positive behavior.

In order to propose our new model, we designed the following methodology. Firstly, we have collected the closed price data from Tadawal. Secondly, the closed price data have been treated using logarithm standard deviation to find LSCS. Thirdly, The LSCS data have been decomposed using MODWT function that divides the LSCS data into two partitions, namely, details coefficient (high fluctuated data) and approximation coefficient (low fluctuated data). We have employed five MODWT functions, namely, Haar, Db, LA-8, C6, and BL14. The approximation coefficient for each function consists of the main features of the data and is used as output in the forecasting model. Fourthly, the approximation coefficient (LSCS) for each function is used with input variables (Repo and Loil) inside ANFIS to our proposed model MODWT-ANFIS. Finally, the best MODWT-ANFIS model has been compared with other functions of MODWT-ANFIS and also with traditional models (ARIMA and ANFIS models).
In order to make a fair comparison, we first applied 80% of the original data and the converted data to the proposed model and then selected the best performing model, which is further used with other suggested models for the remaining 20% data. This confirms the outstanding performance of our proposed model.
3.1. Endogeneity Issues
In this section, we discuss to select suitable variables by removing multicollinearity, causality test, and multiple regression analysis.
3.1.1. Correlation
In this section, we carefully picked independent variables from various other variables, which are removed depending on the statistical test. Firstly, we removed variables as a result of multicollinearity among independent variables as shown in Table 2. The absence of perfect multicollinearity, an exact (nonstochastic) linear relationship between two or more independent variables, is generally referred to as no multicollinearity. We removed some variables from input variables based on their strong correlation with other input variables. The correlation between the input and output variables is shown in Table 2.
3.1.2. Engle and Granger Causality Test
Engle and Granger’s test uses co-integration to represent causal relationships. Based on static regression, it creates residuals (errors). An augmented Dickey–Fuller test or another similar test uses residuals to detect unit roots. The residuals will be almost stationary if the time series is co-integrated [30, 31, 39, 40]. , where is the output variable, is the input variable set, ECT is the word for error correction whereas , , and are the parameters. If is negative or greater than 1.96, the null hypothesis for the Engle Granger test (H0: there is no co-integration) is rejected. In more details, the rule for hypothesis testing says that if the value is less than or equal to the critical value, then we reject the null hypothesis. Table 3 explains the Engle and Granger test for output and input variables and shows that all values of “ value” are less than 0.05. Accordingly, the null hypothesis rejected the input variables, and we conclude that there is sufficient evidence to support the claim: there is co-integration with output variables at a significant level of 5%. This result almost implies that the output variable is influenced by the input variables.
3.1.3. The Results of Multiple Regressions
An extension of simple linear regression is multiple regression that is used when we want the value of a variable to be predicted based on the value of two or more other variables. In this study, the variable that needs to be predicted is the dependent variable (LSCS) whereas Repo and Loil are the independent variables that are used to predict LSCS. The multiple regression analysis is shown in Table 4. At 5% significant level, the Repo rate and Loil are significant. In addition, R-square and adjusted R-square are approximately 46%, which implies that the independent variables can explain about 46% of the output variable. F-statistic at 1 percent signifies that the linear regression model is better suited to the results.
There is negative relationship between oil prices and volatility risk , which measures the standard deviation of closing stock prices. This indicates that the increase in oil prices will reduce the volatility risk in stock market. On the other hand, the Repo rate has positive relationship with volatility risk . This implies that the increase in Repo will increase the volatility risk.
4. Results and Discussion
The current study investigates the closing price data from Tadawul. It has been selected for a variety of reasons. In terms of financial market volatility, emerging markets have a deep characteristic. Due to lack of information, stochastic trading, unprofessional financial analysis, and the Saudi sector have experienced considerable volatility. Furthermore, the investors from outside the Gulf Cooperation Council (GCC) are not permitted to invest in Saudi stocks. Tadawul is the largest exchange market in the Middle East. As seen in Figure 3, the volatility data are decomposed using MODWT with the LA-8 function.

Firstly, the closed price data have been treated using logarithm standard deviation to find LSCS. Secondly, The LSCS data have been decomposed using MODWT via R-statistic software. The MODWT mechanism divides the LSCS data into two parts, namely, details coefficient (high fluctuated data) and approximation coefficient (low fluctuated data). The approximation coefficient for each function consists of the main features of data and is used as output in the forecasting model. We have employed five MODWT functions, namely, Haar, Daubechies (Db), least Asymmetric (LA-8), Coiflet (C6), and the best-localized function (BL14). As a result the best function with MODWT is LA-8 (see Table 5). The best result of MODWT with the LA-8 is described in Figure 3.
MODWT-based decomposition is an effective approach for revealing variations, magnitudes, and phases of the data. The WT will determine the levels of decomposition using the formula, according to the WT mechanism where the original signal is referred to as . The next component shows the plot of the transformed data approximation coefficients at one level of approximation (TV1). The TW1 reflects the level of detail, whereby TW1 is the plot of the first level of the coefficients of detail, so the fluctuation can be explained by this level. Note that, 80% of the data are represented by 1620 samples out of 2026 total samples that are given along x-axis of 3.
Tadawul has numerous fluctuations from 2011 to 2019. It was recorded that the general index of stock exchange market dropped to 6417.7 points in 2011 whereas it grew up to 8535 points in 2013. Market management modified the trading process from SAXESS to X-Stream INET, and an interactive multiuser system (IFSAH) was developed to improve the market’s efficiency and effectiveness [41]. The fluctuation of stock prices is one of the issues that confronted various economies around the world. Tadawul has been influenced by both domestic and international economies. Consequently, financial crises in other countries are passed to domestic market. As a result, Tadawul was hit hard by the global financial crisis in 2008 [42].
4.1. The Result of MODWT Function
Table 5 provides the results of the suggested models using the first 80% of the dataset. The original LSCS data are provided by MODWT models. Based on the comparison, MODWT (LA-8) is identified as the best model since it has a minimum value of 0.0000053, 0.0032142, and 0.0644968, respectively, for ME, MAE, and MAPE. The MODWT (LA-8) model used LSCS as output variable whereas employed Repo and Loil as input variables to construct the ANFIS model.
4.2. The Result of Forecasting WT Models with ANFIS
In order to validate our findings, the forecasting is conducted using the remaining 20% of the transformed and original data with the same proposed models. The best model is ARIMA-MODWT (LA-8) with ANFIS since it has the lowest ME, RMSE, MAE, and MPE-fit as shown in Table 6. Similar to the training phase, LSCS is used as output variable whereas Repo and Loil are used as input variables by MODWT to construct ANFIS and ANFIS + MODWT models.
5. Conclusion
In this study, we have proposed a new model (MODWT-LA8-ANFIS) successfully. The model is used to forecast the closing price in stock market. We have selected oil price and Repo rate as input values based on correlation, the Engle and Granger Causality test, and multiple regressions. We found that there is weak correlation between the input variables (r = 0.327). On the other hand, the correlation between oil price and output variable (closing price) is strong (r = −0.673). Moreover, the input variables have causality with output variable based on the Engle and Granger Causality test. In order to check the significant effect, the multiple regression test is used. As a result, the input variables are significant at level 5%. The output variable is collected from Tadawul from October 2011 to December 2019 with 2026 observations. The input variables in this study have been collected from Saudi Authority for Statistics and Saudi Central Bank. The MODWT mechanism splits variables into details coefficient and approximation coefficient. MODWT has five functions, namely, Haar, Daubechies (Db), least Square (LA-8), Best localization (BL14), and Coiflet (C6). Therefore, the output variable is split into details coefficient (high fluctuated data) and approximation coefficient (which consists of the main features of data). The approximation coefficient data (MODWT-LA8) are used with input variables to build our model MODWT-LA8-ANFIS. The MODWT-LA8-ANFIS has been evaluated using statistical tests, namely, mean error (ME), root mean squared error (RMSE), and mean absolute percentage error (MAPE). The MODWT-LA8-ANFIS model has been compared with traditional models (ARIMA and ANFIS models). The MODWT-LA8-ANFIS is more accurate than the traditional models. Therefore, the new proposed forecasting model can be generalized to forecast in other international stock markets. Furthermore, this model is sufficiently powerful to optimize business processes for economic development of a country.
Data Availability
The data are publicly available online that are referred to in the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.