Abstract
This study utilizes a recent nonparametric disaggregation -nearest neighbor (NN) model, to resample monthly flows depending on annual flows at different sites. Both temporal and spatial approaches will be followed in this model while preserving the distributional statistics of the observed data. This model assumes that a set of aggregated annual streamflows at a key station is available and desired for disaggregation to a corresponding series of streamflow at key station (temporal disaggregation) as well as at tributary stations (spatial disaggregation). The model is applied to the annual streamflow data of particular stations in the Kızılırmak Basin, which is exposed to drought periods during the years (1970–1974 and 1994-1995). The aim of this study is to find the possibilities of using the nonparametric approaches as generators of monthly flows, with emphasis on the ability to reproduce the statistics related to drought and storage analysis for the selected stations in Turkey. The results show that the spatial disaggregation approach has the ability to reproduce the historical data better than the temporal approach for the tested sites and provides a variety of generated monthly sequence flows that can then be utilized to analyze the performance of the water resources planning system.
1. Introduction
Streamflow simulation is an important component in the analysis of water resource management, flood, drought, and reservoir operation. Water resources planning and management depend mostly on the observed streamflow data. The lack of streamflow data is an obstacle to suitable water resources management, especially in the developing countries. Additionally, the lack of fine temporal resolution data (e.g., daily) for detailed hydrological model represents another problem. Streamflow simulation processes have earlier been performed utilizing linear autoregressive moving average (ARMA) models for annual data and parametric autoregressive (PAR) for seasonal data. These models joined with some assumptions like a probability distribution applying to the historical flow. Pumo et al. [1] developed a model to estimate monthly runoff depending on many variables, like precipitation, temperature, and exploiting the autocorrelation with runoff at the previous month. The model reproduced the observed hydrological time series at both monthly and coarser time resolutions. Among the available stochastic generation models, the disaggregation method is proposed to overcome the problem of data limitation. Through this method, streamflow can be disaggregated into lower-level scales, either temporally or spatially. While the only one streamflow data set observed at the key station can be used in temporal disaggregation method, we can include the data of periphery stations into the simulation procedure in spatial disaggregation method.
There are two main disaggregation methods: parametric and nonparametric. Unlike the nonparametric approaches, parametric approaches require linearity and statistical distribution assumptions. The first parametric approach was proposed by [2] to disaggregate the annual to seasonal streamflows using a linear autoregressive (AR) model for a single site. Although their approach is accepted as a pioneer disaggregation model, it has two disadvantages: (1) the moments being preserved are not consistent; (2) the number of parameters is large [3]. The basic approach was improved by adding a column matrix containing seasonal values of the previous year and one more parameter matrix [4]. The model required an excessive number of parameters to be estimated. So, a condensed model was introduced for the temporal disaggregation model to reduce the number of parameters required drastically [5]. The autoregressive moving average (ARMA) and typical PAR(1) models were also utilized to modify and improve the basic model [6]. A periodic disaggregation method was proposed by [7] for disaggregation into subseasonal flows from seasonal flows generated from a periodic autoregressive (PAR) scheme of any order. The model also shows the ability to preserve the first and second moments of the decade (10-day) flows generated from monthly flows. Indeed, some aforementioned approaches have concentrated on the temporal disaggregation of annual to seasonal or seasonal to subseasonal flows and fit in the transformed space. However, the summation of disaggregation flows may not ensure preserving the historical statistics when the results are back transformed to the original flow domain. So, a stepwise disaggregation approach was suggested by [8] to solve the inconsistency problem of the additive disaggregation approaches. Some studies demonstrated that nonparametric disaggregation models can overcome the drawbacks of the parametric ones [9–11]. The nonparametric approach was utilized by [9] including kernel density estimation approach. Then, this model was improved by [10] using -nearest neighbor (-NN) method to resample monthly flows depending on an annual flow value in a temporal disaggregation or multiple upstream sites based on a downstream site for a spatial disaggregation. This model shows the ability to capture all statistical distributional properties of monthly flows at all sites. Meanwhile, the NN approach presented by [11] was effective to resample daily flows at multisite from a single annual flow value. The model is flexible for disaggregation of annual to any different time scales and preserves the historical statistics very well. The nearest neighbor approach was utilized also in nonparametric approach by [12] to disaggregate seasonal flow to daily flow which includes a two stepwise: from season to monthly disaggregation followed by monthly to daily disaggregation to overcome the weakness of the stepwise procedure presented in parametric approach which is unable to preserve the historical correlation between flows of the first day of the month and last day of a previous month. The software package (stepwise) of model [8] was developed and extended by [13] to be implemented for the multivariate stochastic emulation and disaggregation of monthly hydrological time series to daily series that preserves the characteristics of annual, monthly, and daily historical data. With some nonparametric methods, the values not seen in the historical data cannot be generated. To treat this problem, a modified -NN bootstrap was suggested [14]. This approach is found to offer better performance in capturing the features of historical data when compared to both a periodic autoregressive parametric approach and a nonparametric index sequential method. Additionally, another method was suggested to disaggregate generated annual streamflow data into monthly streamflow series using three approaches with different criteria to set the classes of fragments and to select the fragments [15]. For disaggregation monthly streamflows to a daily flow, a method based on an autoregressive model was presented by [16], and they obtained a good result for reproducing duration curves and the hydrographs of the streamflow.
Given the past researches as described above, it is clear that there is a need for a robust, simple, and parsimonious approach for space-time streamflow disaggregation that can capture any arbitrary features exhibited by the data and applied easily to regulated and unregulated waterways. To this end, we adopted a nonparametric disaggregation model via -nearest-neighbor approach to resample the monthly flow in both temporal and spatial disaggregation because the method is parsimonious, as only the parameter (number of nearest neighbors to be used in resampling) is estimated. The model is utilized to generate monthly streamflow series starting from historical annual data at three stations on the Kızılırmak Basin which, in turn, may be used to manage the water resources system in this region.
2. Methodology
The -NN resample approach was firstly utilized by [17] to develop a nonparametric disaggregation method. They managed to alleviate the drawbacks of the classical parametric and nonparametric methods based on kernel density estimation. However, the -NN approach generates only historical values as it is a resampling technique. The model developed by [11] indicated that aforesaid models are not performing well for disaggregation to daily time scales. Their method is based on conditional probability distribution function, , where is a matrix of flows that sum the vector of annual aggregate flows to be disaggregated, and is the vector of daily proportions, whose elements sum to unity. Their model can be used to disaggregate the annual data to different time scales. The steps of disaggregation annual historical data to monthly data temporally in a single site are summarized as follows. The detailed procedure of the methodology can be found in Nowak et al. [11].(1)Historical monthly streamflow data at any year are transformed to a proportion of the total annual flow to obtain [ (= streamflow value observed on January in an apparent year/total annual flow for an apparent year)] of that year; the same procedure is repeated for all months in the years to obtain a proportion vector matrix, , with dimensions (first column represents the historical years and the other columns represent the monthly flows) where is sample size. The summation of each row except the first column in the matrix should be unity. Then the historical annual flow data are written as a matrix form, , with dimensions .(2)Assume that is the annual flow of any year required to be disaggregated. nearest neighbors of are selected from the historical annual flow (). The number of neighbors, , is calculated from the heuristic scheme as () which is a well-known procedure for this aim [18]. A weight is assigned to each nearest data by using the weight function, , as The weight function provides larger weight to the nearest neighbors and less weight to the farthest neighbors.(3)One of the -nearest neighbors (one of historical years) is chosen depending on the weight function from (1). The proportion vector corresponding to the chosen year, (), is multiplied by the annual streamflow () to obtain the monthly streamflow vector (), which is given as (4)Steps (2) and (3) are repeated to generate all ensembles of monthly streamflows.
For spatial disaggregation, the procedure takes another dimension. The matrix () will be arranged as (), where represents the number of sites needed to be disaggregated. Therefore, the proportion vector will become a matrix of the dimension ().
3. Study Area
Data of the study were selected from three streamflow gauging stations located along the Kızılırmak River in Turkey since they have the longest length of historical data. Indeed, Kızılırmak River is the longest river in Turkey with a length of 1355 km and the basin size of 122,277 km2 [19]. The gauge stations are Salur (EIE 1528) and Yahsihan (EIE 1503) which are located on the main river and Cadirhoyuk (EIE 1541) located on the biggest tributary (Delice River) of the main river as shown in Figure 1. Site information, including latitude, longitude, elevation, and observation period, is listed in Table 1. Historical monthly streamflow data were provided by the General Directorate of State Hydraulic Works (DSI). As shown in Figure 1, EIE 1503 and EIE 1528 stations are located downstream of the existing dams (Hirfanlı, Kesikköprü Dam, and Kapulukaya). Since the observed streamflow data of the two stations have been influenced by the existing dams, unaffected streamflows of these two stations should be obtained. EIE 1501 station, located upstream of the dams, and EIE 1541 station, located on the tributary, were used as a reference to obtain the natural streamflow values (unaffected form) of the stations by obtaining the correlation coefficient between the monthly observed data of these stations before the construction of the dams. Furthermore, data of EIE 1517 streamflow gauging station were used to extend the data of observation duration of EIE1541 station. The measures of goodness of fits () which are explained by EIE 1503 with EIE 1501, EIE 1541 with EIE 1517, and EIE 1528 with EIE 1503 + EIE 1541 were 0.96, 0.86, and 0.88, respectively. These relationships are reasonably good to obtain the natural data by using the regression equation.

4. Results and Discussion
The statistics include monthly mean, variance, skewness coefficient, minimum and maximum discharge, and lag-1 autocorrelation coefficient of flow for the three stations. Statistics were calculated from 100 simulated traces; each trace consists of 37 years (the same length of annual observed streamflow data at each station), using both temporal and spatial nonparametric approaches for comparison. In addition, drought analysis, including maximum length and magnitude, and storage capacity are also calculated from monthly simulation values to evaluate the performances of the two approaches. These statistics are displayed as boxplots. A box in the boxplots indicates the interquartile range of simulations and whiskers, ranging between 5th and 95th percentile confidence bounds. The horizontal line in the box represents the median of simulated values, while the dots represented the outside values in this range. The historical statistics are denoted as a triangle. Performance of any statistic is considered good when the historic statistic value falls within the range of the box.
The mean discharge from the 100 generated flow series is taken as a threshold in order to calculate the drought statistics. So, each period of one or more consecutive months with flow below the mean values is defined as a drought event and the length of each drought event is the duration of the period with flow below the mean discharge while the magnitude is the deficit of the total volume carried by each event with respect to the mean total volume relative to that length (i.e., product of the mean discharge by the length of drought) as shown in Figure 2. In general, some drought lengths are obtained in a time series in order to select demand level and model size. The drought characteristics of the model are computed and presented as boxplots to be compared with that obtained from the historical series [3].

The reservoir storage capacity is calculated by applying the sequent peak method [20]. Some release values are taken as a threshold release ranging between 10% and 90% of the mean monthly discharge. To evaluate the storage capacity, the monthly mean storage () and the standard deviation for 100 generated time series are estimated to find the range limits of storage capacities which can be computed from where can equal 1.0, 1.5, or 2.0. could be taken as 1.96 for 5% significance level [3, 21]. The monthly storage capacities of simulated data were compared with historical ones.
4.1. Temporal Disaggregation Results
The monthly basic statistics from the temporal disaggregation approach along the observed results were presented in Figure 3 (station EIE1503), Figure 4 (station 1541), and Figure 5 (station 1528). It can be seen from the figures that the model reproduces the mean value statistic well in each station. The tighten boxplots indicate that the 100 simulations have a high level of agreement with each other as well as with the historical data. In addition, the mean of historical streamflow data was nearly the same as the median of the simulated data for each month and stations. Boxplots of the variance and skewness coefficient statistics also show that the model is efficient in reproducing the historical data. Furthermore, the variance and skewness values of the historical data were within the boxplot for each month. Extreme behaviors of the simulated streamflow were inspected since they are of primary importance in analyzing reservoir operations and river basin management policies through minimum and maximum results. Also, the results of the minimum and maximum monthly values were reasonably well simulated in all stations. Backward lag-1 correlations are also well captured in all months, which indicated that the generated flows have factual continuity. It can be seen that only the last month of the year and the first month of the next year (December-January) are not captured well in all stations.



The drought analysis was applied to monthly inflow values obtained from the approach. It includes the maximum length period and maximum magnitude of drought. According to Figure 6, the simulated maximum length of drought periods tends to be good at station EIE1541 and over and lower estimated at EIE1528 and EIE1503 stations, respectively (i.e., the simulated maximum length of drought periods). For the maximum magnitude of drought (Figure 6(b)), the results of the boxplots show that the model is unable to reproduce efficiently at EIE1541 station and that the historical statistic is near the edge of the box at EIE1503 and EIE1528 stations.

(a)

(b)
The drought statistics estimated above depend on a selected threshold (herein it is the average streamflow of the historical period). Therefore, the results are specific to this chosen threshold. A more effective process estimates the wanted storage for the selected streamflow series to match various demand levels. This includes the effect of many linked droughts, so it is more accurate in performing critical droughts. The algorithm of the sequent peak [20] is used for this purpose. The results of storage capacity presented in Figure 7 show that the historical data based on storage capacity lies between the range of minimum and maximum limits calculated from the generated data at all stations. This indicates the ability of the approach to effectively preserve the historical storage capacities. Actually, the range of minimum and maximum limits calculated from simulation is wide in all stations and the station EIE1528 shows the best results as the historical storage capacity line falls almost in the middle of the minimum and maximum limits plots. The sequent peak algorithm runs for different demand levels with the historical flow denoted as a triangle and each trace of simulated series (boxplots) shown in Figure 8. It can be observed that the approach provides a variable storage and the simulated results variability increases relevantly with the storage. Consider that the dash line in station EIE1503 represents the storage capacity of 500 MCM. For the demand of 50% of the mean discharge, the observed flow sequences need 445 MCM of storage capacity, while the simulations represent the interquartile range of storage capacities between 346.6 and 590 MCM. This means that about 30% of the storage capacity of simulations is more than 500 MCM and that shows 70% reliability. This variation from the temporal approach simulations can offer a good estimate of system reliability.


4.2. Spatial Disaggregation Results
This section presents the monthly flow results obtained from the spatial disaggregation approach over the study area. The advantage of spatial disaggregation model is the ability to provide reliable streamflow data and realistic spatial structures at each time step and that can be easily adapted to different regions. The performance of the model was evaluated using monthly statistics. Observed and disaggregated values were tabulated in Table 2 for some months (September, October, November, January, and February). It can be seen from Table 2 that the model preserved the mean and variance statistics in all stations and it can capture the skewness for all months very well, especially at low flow months (September and October) at all stations. The minimum and maximum simulated values show the ability of the approach to capture these two statistics for most months. For lag-1 correlation statistic, Figure 9 shows that the spatial approach preserves well the historical statistic, especially in low flow months through the convergence of the historical value with the median of simulated values. It can be seen that only the last month of the year and the first month of the next year (December-January) are not captured well in all stations. The performance of the model in reproducing the monthly streamflow is satisfactory, especially for the skew statistic.

To evaluate the performance of the approach in capturing the drought characteristics, the maximum drought length and magnitude were computed. The maximum monthly drought period lengths demonstrated a good preservation of the historical maximum drought length (Figure 10(a)). In addition, the maximum length drought has a varying length in the boxplots at all stations. This means that the approach generated more drought values than the historic record. The maximum magnitudes of drought (Figure 10(b)) are captured within the interquartile rang at all stations, though they tend to be over- and underrepresented at EIE1503 and EIE1528 stations, respectively. The storage capacity results displayed in Figure 11 show that the historical storage capacity data lie virtually in the center of the maximum and the minimum plots. Also, the historical storage capacity is very near to the limits through the range (10–50%) of the mean discharge. This indicates that the approach preserves the historical storage capacities efficiently.

(a)

(b)

For any selected storage capacity, we can find the demand level corresponding to it as shown in Figure 12. For example, the dashed line represents the storage capacity of 500 MCM at the site EIE1503. So, it can be seen that the spatial approach strongly overestimate storages for demand below the 40% of mean discharge and that about 90% of the needed storage is over than 500 MCM for demand 50%. This variation from the spatial approach simulations can give a good reliable estimation of the system.

The comparison between the storage capacity characteristics of the observed and the generated data from the temporal and spatial approaches at all stations is shown in Figure 13. The results indicate that the storage capacities are reproduced very well when the threshold is taken as 60% of the mean discharge or less. The difference between the historical and simulation data is then increased progressively. In addition, it can be seen that best results are obtained when applying the spatial approach to all stations. This result supports the utilization of spatial approach since it gives a good improved storage capacity compared to temporal approach.

5. Conclusions
This study discusses the efficiency of the nonparametric disaggregation streamflow model to generate monthly flow series. The model was applied at three stations at Kızılırmak River in Turkey. The model is based on -nearest neighbor (NN) approach to resample the proportion vectors of monthly flow data from annual data. Historical annual streamflows were used to examine the performance of the model in generating monthly flow data. For comparison, the model was employed in both temporal and spatial approaches. The results show that the model can reproduce historical data in space and time domain, especially for the first two moments (mean and variance), and can also reproduce the continuity of the flows of the historical data at all stations. The spatial approach is showed to be the best through overall results. The main advantage of spatial approach is the ability to obtain values at each time step and the flexibility to get values at several stations from one station, while the major drawback of both approaches was the inability to preserve the continuity between the last month of a year and the first month of the following year. Additionally, the monthly analysis of maximum drought length and magnitude offered a good preservation of the observed data characteristics. The results of drought analysis indicated that the spatial approach performs better when compared with temporal approach. In addition, the simulations generated a rich variety of dry sequences that will be of great benefit to manage water resources in the basin. The threshold used to determine drought statistics are based on the average of the historical flow. This threshold has been sensitized by the length of the historical data and can be adjusted as required for each basin. So, the various demands and storages founded from the sequent peak algorithm are considered a good way to identify drought analysis. However, the results of the storage capacity analysis obtained from the spatial and temporal approaches are found to preserve the historical data in Figures 7 and 11. In a conclusion, the nonparametric disaggregation streamflow based on NN approach showed efficacy in producing monthly flow values, and the spatial approach is found to be a favorite choice for future hydrological applications at this region in Turkey.
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
Shatha H. D. Al-Zakar and Omar M. A. Mahmood Agha thank the University of Mosul, Department of Dams and Water Resources Engineering, Mosul, Iraq, for giving them the opportunity to pursue their Ph.D. degree studies at the University of Gaziantep.