Abstract
Reservoir inflow prediction is a vital subject in the field of hydrology because it determines the flood event. The negative impact of the floods could be minimized greatly if the flood frequency is predicted accurately in advance. In the present study, a novel hybrid model, bootstrap quadratic response surface is developed to test daily streamflow prediction. The developed bootstrap quadratic response surface model is compared with multiple linear regression model, first-order response surface model, quadratic response surface model, wavelet first-order response surface model, wavelet quadratic response surface model, and bootstrap first-order response surface model. Time series data of monsoon season (1 July to 30 September) for the year 2010 of the Chenab river basin are analyzed. The studied models are tested by using performance indices: Nash–Sutcliffe coefficient of efficiency, mean absolute error, persistence index, and root mean square error. Results reveal that the proposed model, i.e., bootstrap quadratic response surface shows good performance and produces optimum results for daily reservoir inflow prediction than other models used in the study.
1. Introduction
The frequency of massive rainfall upsurges in recent time causing frequent flood damage due to latest regional effect and climate change [1]. The accelerated hydrological cycle enhances the intensity of precipitation events which boost variation in streamflow, and such events are responsible for frequent floods and droughts [2]. Researchers have applied many models to predict stream inflow including autoregressive (AR), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), regression-based models, neuro-fuzzy models, conceptual models, and more complex models based on artificial neural network (ANN) [3–7]. Soft computing models such as ANN, adaptive neuro-fuzzy inference system (ANFIS), and random forest are proposed by Seo et al. [8] for river stage modeling. The proposed models have been coupled with the variation mode decomposition (VMD) technique. They found that the VMD technique had improved performance as compared with the traditional simple models. Therefore, they pointed out that the VMD-based models are more consistent for river stream flow modeling. Kim and Lee [9] examined Bayesian multiple regression analysis for the analysis of regional low flow frequency. Kim and Han [10] introduced the nonlinear autoregressive model with exogenous inputs and self-organizing map (NARX-SOM) for flood prediction.
Wavelet analysis (WA) is a valuable method for the examination of variations, drifts, and periodicities in data. WA also has great attention in the field of hydrology [11–14]. Two-hybrid models such as wavelet-based artificial neural network (WANN) and wavelet-based adaptive neuro-fuzzy inference system (WANFIS) were introduced by Seo et al. [15] for flood forecasting. They concluded that WANN and WANFIS models perform almost similarly and produced reliable prediction results than traditional models. Shafaei and Kisi [16] explored the performance of the WANN model for the prediction of daily river streamflow. The WANN model depicts excellent performance than simple ANN and support vector machine (SVM) models for daily streamflow prediction. Wei [17] introduced a new algorithm called wavelet support vector machine (wavelet SVM) for the prediction of hourly water level. Using different combinations of wavelet technique to the SVM model, a new model developed is wavelet SVM. Gaussian SVM and wavelet SVM models are applied for the prediction purpose. Results divulged that the wavelet SVM model produces more accurate prediction results and provides practical solutions to the water level prediction. The wavelet bootstrap artificial neural network (WBANN) model is introduced for reliable hourly flood forecasting. WBANN and bootstrap artificial neural network (BANN) models depict much better results in comparison with traditional ANN and WANN models. For the peak water level, the WBANN model has excellent results. Overall performance of the WBANN model is good in comparison with that of ANN, BANN, and WANN models [18]. Bashir et al. [19] developed a bootstrap multiple linear regression (BMLR) model for reservoir inflow prediction. The developed model is compared with other models like wavelet multiple linear regression (WMLR), wavelet bootstrap multiple linear regression (WBMLR), and multiple linear regression (MLR) models. They concluded that the BMLR model produced better prediction results than WMLR, WBMLR, and MLR models.
Response Surface-Based Models. Quadratic response surface (QRS) and nonlinear response surface (NRS) are proposed by Yu et al. [20]. They showed that response surface-based models perform well as compared with MLR and ANN models. Keshtegar et al. [21] claimed that a response surface-based model with the fifth-order polynomial function is applied for streamflow prediction in the Aswan high dam, and this model yields satisfactory results. The previous literature available on river flood prediction depicts a lack of research on hybrid response surface (RS) models by using the bootstrap and wavelet technique. This study, therefore, examines the hybrid models for daily reservoir inflow prediction by the conjunction of wavelet and bootstrap techniques to response surface-based models. The prime objective of this research study is to develop a new hybrid model bootstrap quadratic response surface (BQRS) for daily reservoir inflow prediction and comparing the results of the developed model with the remaining studied models: MLR, first-order response surface (FORS), quadratic response surface (QRS), wavelet quadratic response surface (WQRS), wavelet first-order response surface (WFORS), and bootstrap first-order response surface (BFORS).
2. Materials and Methods
2.1. Response Surface Models
The relationship between the response variable and predictors is given as follows:where represents the response variable, is an unknown response function, error term is denoted by , and are the independent variables. The FORS model used to define the linear relationship between independent and dependent variables is given as follows:
The FORS model involves a single cross-product term. and are input variables, and represents response variable. represent the unknown parameters.
The QRS model mathematically is defined as
This model includes the additional two-second order terms as compared with the first-order model.
2.2. Wavelet Analysis
WA has gained popularity and practiced in the hydrologic field in recent years since the introduction of this technique in the early 1980s. WA decomposes the original time series data to remove noise from data and provide a reliable prediction of hydrological data by utilizing important information from different resolution levels. WA is accomplished by providing a joint representation of time series data in the time-frequency domain. To achieve the target, time series data are divided into wavelets, and these wavelets are scaled and translation versions of mother wavelets [22].
Continuous wavelet transform (CWT) and discrete wavelet transform (DWT) are two types of WA. CWT decomposes time series data on all possible scales, and massive amounts of data are generated on each possible scale which necessitates heavy computational work. Moreover, a large amount of data creates redundant information, where the DWT is based on dyadic calculation of data. So, DWT is simple to handle and requires less computational work than CWT [23]. Based on previous applications of WT in the hydrologic field, DWT proves superior results as compared with CWT [24].
For discrete-time series data denoted by , DWT is defined as
In equation (4), is a wavelet coefficient. The scale parameter is denoted by while the location parameter is indicated by and n represents the positive integer. signifies a finite time series data. N represents the total length of data, and mathematically it is equal to . The range of integer n is , and the range of integer m is . Therefore, at a specific interval of time, one wavelet is required to yield one coefficient on an enormous scale . The next scale will be , two wavelets operate in a complete interval of time, and two coefficients are produced. The whole procedure continues until m meets unity (m = 1).
At step m = 1, the scale parameter would be 21. This means 2m−1 or N/2 coefficients are required to explain the time series data at this scale. Thus, for discrete signal, the total number of wavelet coefficients generated is , and if we elaborate, it would be 1 + 2+4 + 8 + ………+2M−1 = N − 1 [25].
2.3. Bootstrap Method
Bootstrap is one of the resampling techniques that are part of nonparametric statistics. Bootstrap is a data-driven technique, and it is used to compute several realizations of data set from a distribution. A set of bootstrap samples are obtained by employing intensive resampling with replacement. The bootstrap technique provides a better understanding of variability and average of original time series data of unknown distribution and minimizing uncertainty [26].
Consider the empirical distribution of the observed sample. Empirical distribution constitutes a probability distribution that allocates 1/n probability to each sample value. In the bootstrap technique, we replace unknown population distribution with the known empirical distribution. The steps involved in the bootstrap procedure are as follows:(i)Prepare a bootstrap sample by generating a sample with replacement from the empirical distribution(ii)Calculate the value of the estimator which is drawn bootstrap sample(iii)Repeat the first two steps by K number of times
2.4. Multiple Linear Regression Model
The MLR model is a technique to be used for modeling the relationship between the response variable and the independent variables. The mathematical form of the MLR model in matrix form is given as follows:
2.5. Performance Indices
2.5.1. Root Mean Square Error
Root mean square error (RMSE) is represented as follows:
2.5.2. Mean Absolute Error
Mean absolute error (MAE) is the average of the difference between observed and predicted values. The range of MAE is from 0 to ∞.
The mathematical expression of MAE is as follows:
2.5.3. Nash–Sutcliffe Coefficient of Efficiency
Nash–Sutcliffe coefficient of efficiency (NSE) is used to evaluate the predictive power of the hydrological model. It is defined as
The range of NSE is to 1. Closer the value to 1 represents that the fit of the model is excellent.
2.5.4. Coefficient of Persistence
Coefficient of persistence (CP) is expressed as
3. Data and Statistical Analysis
The Chenab river basin is the second largest river in Pakistan. The total catchment area of the Chenab river basin is 67515 km2. The foundation of the Chenab river basin is at 30° to 77° east and 32° to 50° north, and it is set up in the upper Himalayas of district Lahaul and Spiti in Himachal Pradesh, India. The confluence of two streams named “Bhaga” and “Chandra” forms the Chenab river basin. River Chenab enters into the plains of Punjab, Pakistan. River Jhelum meets with the Chenab river basin at the Trimmu gauging station. Rivers Ravi and Satluj join river Chenab before the Punjnad gauging station. Marala, Khanki, and Qadrabad gauging stations of the Chenab river play a vital role in building the canal link system in Pakistan. In the territory of the Chenab river basin, flood is a severe issue during the monsoon spell. Heavy rainfall in the monsoon season especially in the upper river catchment basin sets the foundation for flood in river Chenab. Snowmelt supplies on average 40% of total water flow in July. The geographical location of the three gauging stations under study is represented in Figure 1.

Data used for this study constitute of Chenab river basin daily discharge data of three gauging stations during the period 2005–2010 (552 data points) monsoon season from 1 July to 30 September. Original data are separated into two data sets. To train developed models, daily discharge data of three gauging stations for the 2005–2009 monsoon season are applied while daily discharge data for the year 2010 monsoon season (1 July to 30 September) are taken for testing purposes. The performance of the developed models is evaluated through the testing data set.
3.1. Model Development
In analyzing the MLR, FORS, QRS, WFORS, WQRS, BFORS, and BQRS models for streamflow prediction, the key step is the selection of appropriate input variables. In this study, correlation analysis is used to select the input variables. The three gauging stations Marala, Khanki, and Qadrabad have a strong correlation among each other but all these variables have a weak correlation with Trimmu and Punjnad gauging stations as represented in Table 1. Therefore, Marala and Khanki gauging stations are chosen as regressors, and the Qadrabad gauging station is selected as the response variable. Qadrabad gauging station is our forecast site. When two or more variables in the regression model are highly correlated, then multicollinearity exists among predictors. Table 2 shows that both predictors Marala and Khanki have partial regression coefficients with opposite signs which depict that both predictors have the opposite relationship with each other but this association contradicts with original fact. Both predictors have a tolerance value of 0.215 that indicates the presence of multicollinearity among predictors. Variance inflation factors, VIFx1 and VIFx2, respectively, for both predictors are approximately 5. This is another evidence for the presence of the multicollinearity factor. The principal component analysis (PCA) was utilized to solve the problem of multicollinearity among predictors by taking principal component scores which are orthogonal to each other.
Using the Mallat DWT algorithm, data of every gauging station are decomposed into subtime series components called detail and approximation [27]. Each component contributes a different role in original time series data, and these subtime series illustrate different behaviors [28]. Many basic wavelet families can be utilized to transform the original signal. The selection of mother wavelet depends upon data to be analyzed. Within each family, wavelets are classified based on the number of vanishing moments. In hydrological modeling, widely used wavelet families are Coiflets, Haar, symlets, and Daubechies.
The iterative process decomposes the original time series signal into many resolution levels. This decomposition process is called a wavelet decomposition tree. The analysis process continues indefinitely due to iterative behavior. In practice, a suitable number of levels are selected for the decomposition of time series data by using the formula as follows [29]:
The level of decomposition is represented by L, and N is the length of the signal. In this study, the number of decomposition levels to be selected is three. Figure 2 represents the decomposition of time series data (Marala, Khanki, and Qadrabad gauging stations) on the third resolution level.

To test one day ahead prediction, thirteen wavelet FORS models are built up by using Haar, Daubechies, Symlets, and Coiflets wavelet functions with different vanishing moments and presented in Table 3. Sym15 wavelet function has improved results on all performance indices, and this wavelet function is selected to give inputs to FORS and QRS models. Figure 3 indicates the development of the models.

3.2. WFORS and WQRS Models
The hybrid models WFORS and WQRS are developed by the conjunction of the DWT technique with the traditional response surface models FORS and QRS. The following steps were followed for the development of WFORS and WQRS.
Original time series data are decomposed into wavelet components by using the DWT technique by choosing an optimum level of decomposition.
In the second step, effective wavelet components provide input data to the traditional FORS and QRS model.
3.3. BFORS and BQRS Models
BFORS and BQRS are hybrid models, and these models are developed by couple of bootstrap technique with FORS and QRS models.
4. Results and Discussion
Statistical values of MLR, FORS, QRS, WFORS, WQRS, BFORS, and BQRS models on different performance indices such as NSE, RMSE, MAE, and CP were investigated and are displayed in Table 4 for the Chenab river basin.
The main purpose of this article is to observe the applicability of two techniques: wavelet and bootstrap. These techniques combined with response surface-based models to predict river stream inflow. First, the original observed time series data are used as input to MLR, FORS, and QRS models. Second, the data are decomposed by using the DWT technique and by using appropriate decomposed subseries as input to FORS and QRS models to formulate WFORS and WQRS models, respectively. Third, bootstrap technique is applied, and results obtained are used as input to FORS and QRS models to produce two new models BFORS and BQRS. As can be shown from results that the DWT technique illustrates the significantly improved results of wavelet-based models like WFORS and WQRS, then MLR, FORS, and QRS models use raw data set, but when compared these models (WFORS and WQRS) with models involving bootstrap technique (BFORS, BQRS), the performance of wavelet-based models deteriorates. The proposed model BQRS produced excellent results in terms of RMSE = 0.0079 m3/s, NSE = 0.9927, CP = 0.9961, and MAE = 0.0021 m3/s for one day ahead prediction. For the second and third day ahead prediction, the BQRS model has superior performance than the remaining model discussed in the study. In comparison with all proposed models for the testing period, the BQRS model proves good predictive and reliable results for all three days ahead of prediction.
Figure 4 represents the 1 d to 3 d ahead prediction performance of QRS-based models for the Chenab river basin. All figures depict observed discharge versus predicted discharge by using a scatter diagram for the testing period data set. The BQRS model estimates are closer to the observed water discharge when compared with other models used in the study. It was observed that the BQRS model minimizes the error in the prediction of peak streamflow, and its values are closer to the predicted line. Similarly, Figures 5 and 6 show FORS-based models and MLR model, respectively, for 1 d to 3 d ahead prediction. It has been observed that overall the BQRS model has improved results.

(a)

(b)

(c)

(a)

(b)

(c)

(a)

(b)

(c)
5. Conclusions
This research article explores the predictability of two different types of hybrid models: wavelet-based response surface models and bootstrap-based response surface models. The observed time series data decomposed into subseries by applying DWT and effective components are selected to provide inputs to models. The wavelet technique decomposes signals into the multilevel components by adopting multiresolution analysis. The ability to predict time series data of wavelet conjunction response surface-based models is found sensitive towards the selection of wavelet function. Wavelet function sym15 is found better in comparison with other wavelet functions. Overall performance specifies that wavelet-based models produced improved results than the models that used original time series data: MLR, FORS, and QRS. The bootstrap technique is used to minimize uncertainty in time series data. It is observed that the default number of bootstrap resamples is not appropriate. Bootstrap-based prediction models should be enhanced carefully. A small number of bootstrap resamples are suitable for daily reservoir inflow prediction in the Chenab river basin. The bootstrap technique applies to the original time series data to formulate bootstrap-based models: BFORS and BQRS. This study illustrates that the BQRS model is a suitable method for modeling reservoir inflow prediction.
Data Availability
To replicate the findings of this work, the data are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.