Abstract
Drought is a common climatic extreme that frequently spreads across large spatial and time scales. It affects living standard of people throughout the globe more than any other climate extreme. Therefore, the present study proposed a new technique, known as model-based clustering of categorical drought states sequences (MBCCDSS), for monthly prediction of drought severity to timely inform decision-makers to anticipate reliable actions and plans to minimize the negative impacts of drought. The potential of the proposed technique is based on the expectation-maximization (EM) algorithm for finite mixtures with first-order Markov model components. Moreover, the proposed approach is validated on six meteorological stations in the northern area of Pakistan. The study outcomes provide the basis to explore and frame more essential assessments to mitigate drought impacts for the selected stations.
1. Introduction
Drought is a multifaceted and recurring event characterized by precipitation insufficiency, which has significant effects on hydrological systems, agriculture, and society [1, 2]. Drought lasts for a long time and brings extreme meteorological consequences, causing distress to crop yield and other plant reproduction [3]. In recent decades, drought has dramatically impacted the environment and economies worldwide [4, 5]. The determination of the incoming and termination times of the drought is still problematic for drought management. Structurally, the effects of drought slowly add over a period, and it may linger for an extended period. Albeit having abstruse visual effects, these impacts of drought become severe without proper action and remain for a prolonged period even after termination [6–8].
According to drought occurrences and their characteristics, the well-known drought categories are meteorological, agricultural, hydrological, and socioeconomic [9, 10]. Among these categories of drought, a meteorological drought is a climatic event that is associated with a decrease in precipitation. In contrast, all other drought categories have more extensive human and social features [8, 11]. Moreover, the meteorological drought can lead to the other three types of drought; because of the intricacy and severity of drought, it becomes challenging to recognize and evaluate drought characteristics. Therefore, in recent decades, many drought indices have been developed to assess and monitor drought events. Reliable and quality drought knowledge is essential for mitigation policies and preparation in disaster-stricken regions globally. Obtaining knowledge about drought occurrence is crucial for an early warning to lessen the adverse effects. Several drought indices are available in the literature and have been used by decision-makers to mitigate the negative impacts of drought.
There are different commonly known drought indices; for example, Palmer [12] has proposed a drought index called the Palmer Drought Severity Index (PDSI). This index incorporates soil moisture, precipitation, and temperature in a water balance model. Gibbs and Maher [13] have introduced a Decile Index (DI), Shafer and Dezman [14] proposed the Surface Water Source Index (SWSI), while the Standardized Precipitation Index (SPI) was introduced and has been used as a meteorological index by McKee et al. [15]. Albeit having a subtle discrepancy among the indices, the present analysis is accomplished using the SPI [15], which frequently has been used for drought monitoring policies and acquired endorsement from the World Meteorological Organization [16, 17]. It produces a consistent interpretation across various regimes and various spatial climates. Furthermore, it depicts ideal characteristics in forecasting and risk analyses as probabilistic approaches [18–20].
Moreover, multiple techniques have been developed in various studies to evaluate and predict drought occurrences [21–24]. However, drought is considered a complicated dynamic; therefore, much more fundamental work needs to be done to clarify the critical issues and demonstrate the effectiveness in enhancing both the monitoring and prediction of droughts. Hence, it is important to handle a drought process as a predictable dynamic system that helps to reduce the critical effects [5, 23, 25, 26]. Therefore, the current study proposes a new technique, known as model-based clustering of categorical drought states sequences (MBCCDSS) for grouping the categorical drought state sequences to predict the drought severity in the selected stations. The MBCCDSS may accurately and timely inform decision-makers to anticipate reliable actions and plans to mitigate negative drought impacts.
2. Methods
2.1. Standardized Precipitation Index (SPI)
The SPI is commonly used for computing and recording drought occurrences [15]. It can be calculated for different periods based on monthly precipitation data. It provides a spatially reliable interpretation across several climates [27, 28]; Guttman 1998; [20]. Furthermore, the use of SPI is significantly high in geographical and temporal circumstances. The simplicity of calculation and availability of the SPI make it the most familiar worldwide. Usually, SPI-1 and SPI-3 consider meteorological drought, and SPI-6 and SPI-9 envisage agriculture drought. Moreover, hydrological drought is usually envisaged by SPI-12 and SPI-24 [29, 30]. However, the present study considers SPI-1 for quantifying drought occurrences from the data ranging from January 1971 to December 2017.
2.2. Model-Based Clustering of Categorical Drought State Sequences (MBCCDSS)
The primary focus of the clustering technique is to group the data based on similar information. In contrast, specific information can be available from one another. It is prevalent in statistics and computer science due to its great variety of applications. There are numerous clustering techniques contemplated in the literature. Among them, there are various hierarchical clustering algorithms [31, 32], well-known k-means [33], and k-medoids [34] clustering algorithms. Moreover, model-based clustering is a technique that groups the objects of the data and assumes that each object of the cluster can be observed as a sample from some probability distribution [35, 36]. In case there are numerous data groups, various distributions are desired, and finite mixture models are needed [37]. Model-based clustering performance is outstanding in distinctly grouping objects [38]. Multiple challenging applications can be addressed by this technique, including mass spectrometry data [38, 39], text classification [40], and social networks [41]. Some works related to model-based clustering have been done in time series [42] and regression time series [43]. A high number of applications can be handled more reliably by using categorical grouping of sequences [39, 41–43]; however, in drought analysis, it has not established greater attention yet. In drought classification, the analysis of categorical sequences is important to obtain consistent results. Therefore, the present study proposed MBCCDSS that considers the transition pattern of the drought states and provides the basis for using model-based clustering to substantiate more reliable results about drought occurrences. The MBCCDSS is based on finite mixture modeling. The mathematical form for the finite mixtures can be written aswhere K is representing the total number of component distributions with corresponding parameter vectors and showing the mixing proportions, subject to and . = showing the entire parameter vector that has to be estimated. Moreover, the MBCCDSS models each data group by using a functional form of the first-order Markov model components. Furthermore, MBCCDSS used various sequences of drought states. These sequences reflect the steering behavior of drought states and reflect the importance of this on the application site. The drought states (extremely dry (ED), severely dry (SD), normal dry (ND), median dry (MD), median wet (MW), severely wet (SW), and extremely wet (EW)) are classified according to [44]. Now, let = show the -th categorical drought state sequence of length following the first-order Markov model with unique states. Then, we can writewhere , , takes values in and shows the drought state observed in the -th position of . Furthermore, to specify the notation, we denote the initial state probability as = and transition probability as , subject to the restrictions and for . Now it can be written aswhere I (.) is considered as an indicator function and shows the frequency of the transitions from state to state within the-th sequence and assume that each categorical sequence originates from one of the components. represents the total number of components. The order of these components is detected by minimizing the Bayesian information criterion (BIC) [45]. Using the notations and the final form of the finite mixture model with the first-order Markov model with matrix having elements , equation (1) can be written aswhere information of the i-th sequence is summarized in terms of the first state observed and the transition frequency matrix, which is considered as a minimal sufficient statistic, i.e., a pair (, ) for estimating parameters of the model given in equation (2). The estimation of the parameters is done by the expectation-maximization (EM) algorithm [46]. The EM algorithm consists of two steps: expectation (E step) and maximization (M step). In the E step, the EM algorithm finds the conditional expectation of the complete-data log-likelihood function given observed data, and is used to maximize the conditional expectation in the M step. In the expectation step of the EM algorithm, posterior probabilities are calculated at the l-th iteration asand the maximization step involves updating the parameter estimates by the following equations:
2.3. Prediction of Drought States
Using the set of transition probability matrices and a probability distribution linked with mixture components, the -step transition probability matrix can be found bywhere shows the matrix raised to the power L. For instance, . The choice of the distribution depends on the specific application. However, in the current scenario, the mixing proportion estimated vector (i.e., , ,….) and the posterior probability estimated vector (i.e., , ,….) are associated with a specific sequence used to calculate the probability distribution for future drought state occurrences.
3. Application
The proposed technique is validated on six meteorological stations of the region, Northern area, Pakistan (Figure 1). The selection of the region is based on its structural importance and significant climatological characteristics [47]. The appearance in the atmosphere of the selected region adds significant effects on other parts of the country. Moreover, several changes have been observed in the country due to fluctuating weather patterns in the season in various regions. However, the highest temperature has been observed in larger parts of the country, and these parts were highly affected by global warming [48, 49]. Furthermore, global warming has not been affecting the Pakistan atmosphere alone but also the world. Its impact can be observed on temperature and water that cause high temperature and water deficiency. Although future climate changes can be problematic, these changes substantially impact rural livelihoods and their coping accomplishments. Furthermore, drought occurrences can damage several vital sectors of the country; for example, these occurrences can negatively affect the economy, agriculture, and natural resources. Therefore, it is important to understand the drought occurrences more instantaneously by developing inclusive and efficient techniques. In these perspectives, the present study proposed a new technique that meaningfully improves the competency of observing drought occurrences in the selected area. These findings may enhance the capabilities of drought monitoring and mitigation policies.

3.1. Results
The inadequacy of precipitation and anarchy to an expected precipitation pattern cause drought events. The summary statistics of precipitation for selected stations is presented in Table 1. The monthly occurrence of precipitation in various months of the Chilas station is presented in Figure 2, and the precipitation occurrence over the selected period for Gilgit is presented in Figure 3. We took these two stations to present precipitation occurrence; however, the precipitation occurrence for other selected stations can be presented accordingly. The theoretical versus empirical histograms of SPI-1 for selected stations are presented in Figure 4. The presented results in histograms can be envisaged as the discrepancy among stations; this divergence can be arisen due to the natural enactment of the data. In the recent past, many researchers have been working on modeling such discrepancy recitals in the data. Moreover, new procedures were proposed for the standardization based on nonparametric functions and mixture distribution functions [50], but still handling the discrepancy is under contemplation. Furthermore, the temporal behavior of the SPI-1 at various stations can be envisaged in Figure 5. Furthermore, the selected stations show more similar behavior in data over the region for a specific drought index [44]. However, varying distributions can be observed in selected stations (Table 2) to generate categorical values. The BIC are used to select appropriate distributions among the fitted distributions for the selected stations. The BIC value that is−1036.513 is considered the minimum for three parameters (3P) Weibull distribution in Astore station. Furthermore, the three parameters (3P) Weibull distribution has minimum BIC values among all BIC values of other distributions for SPI-1 in particular stations such as Astore, Bunji, Gilgit, and Skardu with values of −1030.985, −1097.487, and −735.125, respectively. The BIC values calculated from some distributions for Gupis station, 4P-Beta with BIC values that are −788.076 and Chilas with BIC value −805.614, were considered the minimum among other specified distributions for these two stations. Usually, the Weibull distribution has applications in hydrology and associated disciplines [51] and has more significant candidacy features for standardization.



(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(e)

(f)

(g)
Furthermore, the concept of varying probability distributions selected for the varying stations advocates finite mixture modeling. Therefore, MBCCDSS is proposed for the prediction of various categorical drought states using a mixture of first-order Markov models. The use of Markov models reflects the dynamics of the drought occurrences. The MBCCDSS assumes the first-order Markov models in this analysis; however, higher-order Markov models can be included [38]. Furthermore, the MBCCDSS uses the categorical values corresponding to each drought state. These categorical values are specified for the various drought states that are classified according to Niaz et al. [44]. Moreover, it assumes that each categorical sequence of the selected states instigates from one of the components. The mixture model order is detected by minimizing the BIC [45]. The mixture model with two components () based on BIC values is selected for the analysis. The performance of the model is detected by including initial state probabilities and without initial state probabilities. It can be observed from Table 3 that, for the first case, the maximized log-likelihood (LogL) value is equal to −3467.271, while in the second case, it is equal to −3477.991. Expectedly, the inclusion of the initial state probabilities in the model yields a higher LogL value as it slightly better fits the data. The variability is rather marginal, and based on the BIC value, the model with initial state probabilities with BIC 7061.757 is preferred over the model without initial state probabilities with BIC equal to 7065.28. However, the BIC have superiority over other competitors in finite mixture modeling, which is used for model selection and its performance.
Moreover, the mixing proportions and the posterior probabilities associated with a specific sequence are used to calculate the drought state prediction. The six sequences are used in MBCCDSS, which means that each sequence contains the monthly categorical observations of the specific station. For example, varying states in the Astor station are considered in sequence one (sequence-1); the second sequence (sequence-2) considers all various states of the Bunji station, states of Gupis’s station are considered in the third sequence (sequence-3), and so forth. The prediction at one month of varying drought states is given in Table 4. However, MBCCDSS can predict the probabilities of the selected states in other months. The obtained results from sequence-1 show that the most likely state to visit in one month is ND; the probability (0.6247) associated with this prediction is higher than other drought states. In sequence-2, the ND state prevails among other drought states. The probabilities of the selected states at one month in other sequences can be observed accordingly. Furthermore, the appearance of NA values shows that the specific state is not available in six categorical drought state sequences. Therefore, MBCCDSS cannot predict any value for this drought state.
3.2. Discussion
The present study uses a mixture of first-order Markov models to develop a new technique for clustering categorical sequences of drought states. The proposed technique is applied to the six meteorological stations of the northern areas. The calculation of the MBCCDSS is based on the categorical sequences of the drought states. These categorical sequences of the drought states are calculated by SPI-1 and used in MBCCDSS to predict drought severity (i.e., ED, SD, MD, ND, MW, SW, and EW) in the selected stations. Furthermore, the outcomes of the MBCCDSS provide information about drought occurrences more plainly and accurately and can be used to support the mitigation strategies. Moreover, the probabilities obtained from MBCCDSS may be used to compare various drought indices, get more precise results about the drought occurrences for various drought states, find several propagations, and calculate various thresholds for different drought intensities in the selected region. Moreover, in MBCCDSS, the initial state and the transition probabilities are considered constant. The MBCCDSS assumes that the observations are time-homogeneous; however, these probabilities can be constructed by considering time as a function. The inclusion of temporal characteristics will improve the efficiency of MBCCDSS for drought monitoring. Furthermore, the results obtained in this study are significant for the existing conditions of the application site as the forthcoming promising climate circumstances can be unsuitable for the extrapolations based on the present analysis.
4. Conclusion
Drought is a slowly emerging issue, and the determination of its occurrence is still an issue to be solved. Structurally, the consequences of drought gradually accumulate over a period, and they may last for a long period. Drought distresses the lives of the people directly more than any other natural hazard and causes maleficent results for the society and the economy of the country. Therefore, it is necessary to handle drought occurrences as a predictable dynamic system, which used a particular memory and helps to minimize the critical effects. A new technique, known as MBCCDSS, is proposed for the monthly prediction of drought severity using model-based clustering. The MBCCDSS employed an EM algorithm for finite mixtures with first-order Markov model components. The MBCCDSS provides future probabilities for each of the drought states in selected stations. Moreover, the outcomes of the study may accurately and timely inform decision-makers to anticipate reliable policies and plans to mitigate the adverse effects of drought.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
Acknowledgments
This work was supported by grants from the National Natural Science Foundation of China program (41801339). The authors are also thankful to the Deanship of Scientific Research at King Saud University, through research group no. RG-1437-027.