Abstract

This study focuses on modeling and predicting extreme rainfall based on data from the Southern Highlands region, the critical for rain-fed agriculture in Tanzania. Analyzing 31 years of annual maximum rainfall data spanning from 1990 to 2020, the Generalized Extreme Value (GEV) model proved to be the best for modeling extreme rainfall in all stations. Three estimation methods–L-moments, maximum likelihood estimation (MLE), and Bayesian Markov chain Monte Carlo (MCMC)–were employed to estimate GEV parameters and future return levels. The Bayesian MCMC approach demonstrated superior performance by incorporating noninformative priors to ensure that the prior information had minimal influence on the analysis, allowing the observed data to play a dominant role in shaping the posterior distribution. Furthermore, return levels for various future periods were estimated, providing guidance for flood protection measures and infrastructure design. Trend analysis using value, Kendall’s tau, and Sen’s slope indicated no statistically significant trends in rainfall patterns, although a weak positive trend in extreme rainfall events was observed, suggesting a gradual and modest increase over time. Overall, the study contributes valuable insights into extreme rainfall patterns and underscores the importance of L-moments in identifying the best fit distribution and Bayesian MCMC methodology for accurate parameter estimation and prediction, enabling effective measures and infrastructure planning in the region.

1. Introduction

Extreme rainfall events are no longer a distant possibility but a harsh reality that we must face [14]. These natural phenomena are becoming more frequent, more intense, and more devastating, affecting millions of people and ecosystems worldwide [3, 5]. Recent scientific evidence [1, 38] claims global warming to be the root cause of these extreme weather events. With rising temperatures and increasing levels of atmospheric carbon dioxide, the hydrological cycle is thrown off balance, leading to more frequent and severe floods and droughts across the globe [1, 4, 5, 9]. As per the findings of the Intergovernmental Panel on Climate Change (IPCC) Report [10], rural, underprivileged communities within developing nations face the highest vulnerability to the impacts of climate change. The repercussions of climate change involve heightened variability in extreme rainfall patterns, leading to fluctuations in river flows and an increased occurrence of droughts and floods [1114].

Like many other countries, Tanzania is vulnerable to these changes [15]. For example, according to the Tanzania Climate Status Statement of 2022 [16], on February 26, 2022, extreme rainfall and thunderstorms led to casualties and displacement in Sumbawanga, Nkasi, and Kalambo districts, Rukwa region. Additionally, in Ludewa district (Njombe region), extreme rainfall between February 22 and 23, 2022, caused the destruction of four bridges, disrupting road transport temporarily. Likewise, between April 26 and 28, 2022, extreme rainfall caused severe flooding in the Mbeya and Songwe regions of Southern Highlands Tanzania [16]. The flooding resulted in 5 fatalities, 21 injuries, 5 missing people, and the displacement of 3150 individuals. Numerous buildings, including houses, schools, and religious structures, were brutally damaged. Additionally, agricultural fields spanning 24,700 acres of farm fields, including paddy, maize, and banana farms were destroyed, and a significant number of livestock perished [16]. The flooding also led to the destruction of bridges and roads, impacting transportation in various areas [16]. Generally, extreme rainfall events have already taken a severe toll on the country’s agricultural sector, causing significant losses in maize and rice production in the Southern Highlands [15, 1719]. Tanzania and other countries that rely heavily on agriculture, the impact of extreme rainfall is already posing great challenges for their economies. Therefore, it is crucial for climatologists, meteorologists, economic planners, and other policymakers to have a better understanding of heavy rainfall patterns and their future behaviors for efficient and effective planning, decision-making, and mitigation purposes [17].

Extreme rainfall is characterized as the highest amount of rainfall experienced within a 24-hour period annually [20]. One way to analyze changes in extreme rainfall is to use statistical distribution models. These models are useful for studying extreme rainfall and can aid in flood mitigation and control [21, 22]. Numerous probability distributions can be employed to analyze extreme rainfall patterns. However, it is important to note that the data frequently exhibits non-normal distribution characteristics across multiple regions. While the Anderson-Darling and Kolmogorov-Smirnov tests have often been used for fitting probability distributions, they may not be sensitive enough to identify substantial deviations from an assumed distribution at a particular location [20]. To address this issue, researchers suggest utilizing L-moments as a reliable method to estimate higher statistical moments. The L-moments method allows for reasonable estimates to be obtained even with as small as 20 sample sizes and without making assumptions about the underlying distribution [20, 23]. This method also offers an excellent fit to the entire cumulative distribution function using only a limited number of parameters.

Several researchers [2428] have used L-moments to fit probability distributions of extreme rainfall series, providing a powerful alternative for fitting probability distributions of non-normally distributed data, which is particularly important when dealing with extreme rainfall data where significant deviations from an assumed distribution can occur at a specific location.

This study aims at finding the most suitable probability distribution model for the extreme rainfall series utilizing data from Southern Highlands, Tanzania, and to estimate the extreme rainfall amounts that might occur once in 5, 10, 20, 50, and 100 years to come. The L-moments help guide the selection of candidate distributions and provide insights into the shape of the data. It is used to perform distribution analysis to select the best distributions. The L-moments method is a popular approach for analyzing the statistical properties of hydrological variables, especially in regions with limited data [29]. This study applies five distinct probability distribution models, namely, the GEV, generalized pareto (GPD), generalized logistic (GLO), generalized normal (GNO), and the Pearson type three (PE3) distributions in order to evaluate the best fit model. These distributions are chosen due to their simplicity, superior performance, and widespread usage in frequency analysis of extreme events, as highlighted by [20, 23, 30]. It is widely acknowledged that numerous approaches have been extensively employed to estimate the parameters of potential probability distributions, such as the maximum likelihood method, method of moments, L-moments method, and least squares method [22, 3133]. Nonetheless, the Bayesian method has not seen widespread utilization. In this research, we utilize the maximum likelihood estimation, L-moments, and Bayesian method to determine the parameters for the chosen probability distribution, and we conduct an extreme value analysis of annual maximum daily rainfall to compare the efficacy of the frequentist approach with the Bayesian approach. Bayesian modeling allows for more supplementary information via prior knowledge, which can improve statistical inference on extremes, considering that extreme data are rare by nature [8].

Overall, it is crucial to model extreme rainfall to understand its potential impacts and inform policy decisions [15, 17, 34, 35]. By using appropriate modeling techniques, we can better understand the behavior of extreme rainfall patterns and how often they occur in the Southern Highlands of Tanzania and help mitigate their effects on the economy, agriculture, and society [36]. The rest of this work is structured as follows: The data, study area, and methodologies for model fitting are described in Section 2 along with the best fitting techniques. Lastly, the statistical analysis results of the most suitable model are presented in Section 3, whereas Discussion and Conclusions are presented in Section 4 and Section 5, respectively.

2. Methods

2.1. Study Area

Figure 1 shows the study area that covers four stations in the Southern Highlands of Tanzania, which is the administrative zone comprised of the four regions: Iringa, Rukwa, Mbeya, and Ruvuma. The Highlands zone is at latitude S and longitude E. The rainfall amount in the area varies between to . The rainy season occurs from November to April. During dry season, higher elevation regions face some light rain and mists from May to August. Most rainfall is attributed to thunderstorms forming over the Lake Nyasa (Lake Malawi), and the areas facing towards the lake tend to receive more rain. The highlands have lower temperatures ranging from C to C than the surrounding lowlands. The highest elevated regions face nighttime frosts regularly from June to August.

2.2. Data Description

Rainfall data (in millimeters) was collected from the Tanzania Meteorological Authority (TMA) from four different weather stations in the Southern Highlands over a period of 31 years from 1990 to 2020. Each station represents a different region in the highlands and has its own unique rainfall patterns and location. Because the data were analyzed station by station, a test for homogeneity was not relevant for this study.

As per information provided by the TMA in 2011 and 2019 [15, 17], rainfall that exceeds 50 mm within a 24-hour period is classified as intense (or extreme) and has the potential to result in flooding. Figure 2 presents the original daily rainfall data spanning from 1990-2020, depicting instances of extreme rainfall occurrences exceeding the threshold of 50 mm for each region as indicated by the values above the red line (i.e, rainfall 50 mm).

2.3. Probability Distributions

Extreme value theory has emerged as a crucial statistical discipline in the field of meteorology and hydrology, playing a significant role over the past six decades. This has led to its widespread adoption in numerous research studies. While many probability distributions can be applied to represent empirical rainfall extremes [22], this study focuses on five commonly used distributions: GPD, GLO, GNO, PE3, and GEV distributions. The aim is to identify the most suitable distribution that best fits the empirical annual rainfall extremes. Leveraging L-moments, we determine the best fit distribution and utilize it to analyze and predict extreme rainfall patterns in the Southern Highlands Region of Tanzania. By employing maximum likelihood, L-moments, and Bayesian methods to estimate the parameters , , and , we gained valuable insights into the behavior of extreme rainfall events. The best fit model facilitates the estimation of return levels for extreme rainfall, which is essential in designing robust infrastructure and effective flood protection measures.

2.3.1. Generalized Extreme Value (GEV)

The GEV distribution arise from a combination of three models, namely, the Gumbel, Frechet, and Weibull whose cumulative distribution function as given by [9, 20, 22] is as follows:defined on the set of values . The parameters of this family of models satisfy the conditions and . The distribution in question was independently discovered by both [37, 38]. Its three parameters, namely, location , scale , and shape are used to classify it into three types of extreme value distributions, with being the key differentiating factor. When and , the distribution falls under Frechet, and Weibull, respectively. When , it corresponds to the limit as of equation (1), leading to the Gumbel family of distributions.

2.3.2. Generalized Pareto (GPA)

The cumulative distribution function for GPA is given as [20]:whereby, and are scale, location, and shape parameters. The range of is given by

2.3.3. Generalized Logistic (GLO)

The cumulative distribution function for GLO given by [20, 39] is as follows:whereby, and are scale, location, and shape parameters. The range of is given by

2.3.4. Generalized Normal (GNO)

The probability density function (PDF) and the cumulative probability distribution (CDF) for the GNO distribution are respectively given bywhere , , and are the scale, shape, and location parameters, respectively. The range of is defined as follows:

2.3.5. Pearson Type Three (PE3)

PE3 has the following cumulative probability distribution [20]: are the location, scale, and shape parameters and denotes the Gamma function given by

2.4. L-Moments Method

L-moments are based on linear combinations of data or statistics that are sorted in ascending order [40, 41]. They are an improvement over normal moments for assessing probability distribution shape and estimating parameters, especially for small sample sizes. According to [42], L-moments can be computed linearly using Shifted Legendre Polynomials and the uniform distribution function as the foundation, providing a more robust estimate for a given amount of data. The first four L-moments of a probability distribution can be calculated using a sorted sample vector of length , ranked value , and expected value function as follows:

Based on the information presented above, it is evident that the first L-moment represents the mean value of a data set by calculating the expected value of all individual values. The second L-moment, on the other hand, measures the expected difference between the largest and smallest values in the data set and is equal to half of this expected value.

2.4.1. L-Moments Ratios

The second L-moment is always normalized by the mean value, resulting in a L-moment coefficient of variability (L-Cv) calculated as the ratio of the second L-moment to the first L-moment:

Higher L-moment ratios are obtained by scaling the corresponding higher L-moments using the second L-moment as a reference point [29]:

The L-Cv is similar to the standard CV and L-skewness measures the lack of symmetry in a distribution, while L-kurtosis measures the peakedness of the distribution. L-skewness and L-kurtosis are constrained to lie between and L-kurtosis is bounded by to be not less than . L-moments can estimate many probability distributions and are computed from probability-weighted moments. More information can be found in [43].

2.4.2. Probability Weighted Moments

According to [20, 39], probability-weighted moments (PWM) are calculated from sorted data values, where represents the sorted sample of size . The PWM of order is defined by the following equation:where is the expectation and is the cumulative distribution function of a random variable . The estimators of PWM can be obtained using the equation as given by [20, 39]:

The first four unbiased sample L-moments (estimators) are given by [20] as follows:

The sample L-moment ratios are given bywhere and are the natural estimators of and respectively, and the sample L-Cv is given by

2.4.3. L-Moment Ratio Diagram

The L-moment ratio diagram is a helpful tool for analyzing and testing the fit of probability distributions. This technique involves plotting L-Skewness versus L-Kurtosis and is commonly used to choose the most viable probability distribution for a given dataset. In the study by [20], two-parameter distributions were represented as points on the diagram, while three-parameter distributions were depicted as lines. The best fit distribution was determined by comparing the location of the data point to the line representing the candidate distribution and by measuring the distance between and , where and represent the skewness and kurtosis values of the observed data, and is the kurtosis value of the candidate distribution [20].

2.4.4. Block Maxima Technique

The extreme values of a set of independent observations, , were analyzed by splitting the data set into m blocks, each of length , such that is a large number. By taking the maximum value within individual blocks, a new set is generated, that can be modeled using the appropriate extreme value probability distribution. It is important to choose an appropriate block size, since if is too small, the limit model may not provide a good approximation, which can lead to biased parameter estimation and unreliable extrapolation. Conversely, when is extra large, the number of blocked maximum will be insufficient, causing large variances in estimation. Thus, it is essential to find a compromise between bias and variance.

2.5. L-Moments Estimation Method

The L-moments method estimates distribution parameters by comparing the sample L-moments with the corresponding population L-moments [39]. L-moments offer quantifications of location, dispersion, skewness, kurtosis, and other aspects of probability distributions or datasets.

2.5.1. L-Moments for Generalized Extreme Value Distribution

The GEV parameter estimates distribution has three parameters: (location), (scale), and (shape). The L-moments parameter estimates for GEV were calculated using the following expressions when [44]:

The symbol represents the gamma function, which is defined as follows:

The parameter is approximated using the expression , where is calculated as .

For more information about L-moments of various probability distribution models, refer to [44].

2.5.2. Maximum Likelihood Estimation Method

MLE aims to identify model parameters that optimize the likelihood function with respect to those parameters. However, MLE is most effective when sample sizes are large (as ). Despite this, some researchers still prefer to use MLE even with small sample sizes because it can provide accurate estimates for certain values of the shape parameter. [9] commented that MLE is reliable when the shape parameter is greater than , but its consistency decreases when the shape parameter is less than .

2.5.3. Computing Maximum Likelihood Parameter Estimates

According to [9], if , the log-likelihood function of the GEV with three parameters can be expressed as follows:given that for

In addition, assuming that is a random sample from the GPD with the extreme value above a threshold , the log-likelihood function of the GPD can be obtained using the following equation provided by [33, 45]:

Similary, MLE for other distributions were obtained but not presented in this work.

2.6. Bayesian Modeling Method

In the Bayesian approach, the model parameters may not be random, but they are treated as random by assigning probability distributions to describe the uncertainty of their values [21, 46]. According to [47], three critical steps are involved in the Bayesian approach to parameter estimation, which include specifying a probability model for the data, constructing the posterior distribution, and checking the model for its outputs before using them for inference.

2.6.1. Fundamental Principles

Let a dataset representing realizations of a random variable with a density belonging to the parametric family as detailed by [9]. Prior beliefs about the parameter can be expressed using the probability density function , independent of the observed data. According to [9], the likelihood for is given by:

The combination of prior knowledge and the probability distribution of the data is achieved through Bayes’ theorem, leading to the posterior distribution of as given by [9]where is the prior distribution for , is the likelihood function, and is the posterior distribution.

In extreme value analysis, the goal is often to predict the probability of extreme events in the future based on observed data. The predictive distribution, which describes the likelihood of different outcomes of a future experiment, can be obtained within the Bayesian framework. If represents a forthcoming observation with a probability density function denoted as , and corresponds to the posterior distribution of based on the observed data , the predictive distribution of given can be described as follows [9]:

Computing this integral may be difficult, but simulation-based techniques like Markov chain Monte Carlo (MCMC) can provide estimates of the posterior distribution through a simulated sample [9]. Bayesian procedures can be advantageous when a suitable prior distribution can be specified [9]. The main objective is to obtain a Markov chain that converges to the posterior distribution in Bayesian statistics. To accomplish this, there are several algorithms available for implementing MCMC simulations, but this study uses the Metropolis-Hastings algorithm.

2.7. Return Level Estimates

After fitting a probability distribution model to the data, the next step is to estimate the return levels for rainfall. The T-year return level, denoted as , is the level that is exceeded on average only once in T years. The cumulative probability of nonexceedance can be given by the following equation [9, 48]:where T is the return period. For instance, in order to obtain from the GEV model with return period T, we need to solve for in equation (1) which gives the following [9, 48]:

In addition, for the GPD model, the return level is defined as the extreme level that is exceeded on average once every observations [9]. This is given bywhere represents the probability that the value exceeds the threshold , i.e., . The quantity can also be defined as the -observations return level [9].

3. Results

3.1. Statistical Analysis

The statistical analysis presented in this section reveals some interesting insights about the yearly maximum daily rainfall in four stations. According to the results shown in Table 1, the Ruvuma region receives the highest amount of rainfall, indicating that this region may be more susceptible to flooding and other related hazards. In contrast, Rukwa, the station with the lowest rainfall amount, may experience some challenges related to water scarcity and drought particularly during the dry seasons when water sources dry up. These findings underscore the significance of considering local weather patterns and climatic conditions when planning for development and resource management in the region.

Using the block maxima method, Figure 3 displays the annual maxima daily rainfall data collected over a time period of 31 years from four stations in the Highlands Region. Upon closer inspection, the results indicate that there is no clear trend in the data, suggesting that these stations have experienced a relatively stable pattern of extreme rainfall events over the past few decades.

3.2. Data Assumptions and Trend Detection Analysis

Before engaging in extreme value modeling of rainfall data, a thorough examination of the data’s underlying assumptions was conducted. Potential trends in the series, capable of influencing the modeling process, were carefully assessed for normality, homogeneity, and stationarity. Tests such as Shapiro [49, 50] and augmented Dickey-Fuller (ADF) [51, 52] were employed for this purpose. Given the diverse distribution of stations with varying rainfall patterns and geographical locations, a station-by-station data analysis was conducted, rendering the homogeneity test inapplicable for this study. For more detailed procedures, refer to [11, 19, 53, 54] and [55]. Table 2 displays the results, indicating that the data fulfill the mentioned assumptions and can be used for further analysis. The study also analyzed the Mann-Kendall trend test and Sen’s slope estimator for annual maxima rainfall in all stations. Kendall’s tau, a measure of the strength and direction of the trend in the data, indicates a weak positive correlation between time and rainfall. All p values are greater than 0.05, suggesting insufficient evidence to reject the null hypothesis of no trend. Sen’s slope values are relatively small, indicating a gradual increase in rainfall over time. The trend analysis results, illustrated in Figure 4, suggest a weak positive trend in rainfall, aligning with the return levels obtained in Table 3. The findings are consistent with the study conducted by [56].

3.3. L-Moments Ratios

L-moments results, as shown in Table 4, provide valuable insights into the characteristics of extreme rainfall in these regions. The values represent the mean rainfall depth, with Ruvuma having the highest average rainfall depth of 74.3066. The values indicate the variability in rainfall depths, with Iringa having the lowest spread (scale) of 8.34 kurtosis and Skewness values reveal the heaviness of the tails, and the departure from symmetry, respectively. Iringa shows a slightly right-skewed distribution (0.3177) and a relatively high kurtosis (0.2267). The values reflect the departure from symmetry in the upper tail, with Iringa having a thicker upper tail (0.9665). The values represent the overall shape, and all stations exhibit similar values, indicating moderate peakedness. These findings enable the construction of L-moment ratio diagrams to identify the best-fit distribution for extreme rainfall in each region, facilitating a comprehensive understanding of the rainfall characteristics and aiding in effective hydrological modeling.

3.4. L-Moments Ratio Diagrams

The study examined the characteristics of extreme rainfall in the research area by analyzing two key parameters: the coefficient of variation denoted as and the mean represented by . To ensure the reliability of the results, the L-moments ratio diagram was employed to assess the relative distribution of extreme rainfall. This diagram played a crucial role in selecting the most appropriate distribution among GPA, GNO, GEV, GLO, and PE3 across the four stations. Based on the L-moments ratio diagram results of Figure 5, it was evident that the GEV distribution exhibited the best fit for extreme rainfall in all four stations. Consequently, parameter estimation for the GEV distribution was carried out using MLE, L-moments, and Bayesian MCMC.

3.5. GEV Parameter Estimates

To estimate the GEV distribution, we extracted the blocked maximum of daily annual maximum rainfall data from all four stations, using blocks of days to ensure that our samples were sufficiently large. We then applied maximum likelihood estimation to fit the model to the annual maximum. Also, the Bayesian MCMC was used to provide the Bayesian parameters of the annual maxima rainfall using noninformative priors. When selecting noninformative priors for GEV modeling, we minimized prior influence to allow the data to play a key role in shaping the posterior distribution [57]. For the location parameter , we deployed a Normal distribution that is broad and symmetric distribution centered at 0. The scale parameter , was modeled with a wide normal distribution to encompass the expected range of extreme values. Similarly, a normal distribution that permits a wide range of shapes, indicating minimal prior information about the tail behavior was used for the shape parameter . Following the methodology outlined by [2, 57, 58], the joint density of our parameters , , and was represented as .

Thus for the noninformative priors we employed the following: , , and . Here, denotes a normal distribution with a mean of 0 and a variance of 10,000, ensuring a high variance to affirm the absence of external information.

Table 5 shows the outcomes for MLE, L-moments, and Bayesian posterior parameter estimations for the GEV over all four stations. We plotted the diagnostic plots for the GEV model for all three methods as shown in Figures 68. The model diagnostic plots for rainfall at Iringa exhibit a satisfactory fit for the GEV distribution across all methods. Although we conducted similar diagnostic plots for the other stations, we have not included them in this study. Nevertheless, all important diagnostic plots including quantile plot, the probability plot, density plot, and the return level plot present compelling evidence that GEV distribution is an appropriate model that fits the block maxima data in all stations.

From Table 5, the MLE shows that the shape parameter , was estimated to be 0.249 for Iringa, 0.089 for Mbeya, −0.06 for Rukwa, and 0.0199 for Ruvuma. The positive values for Iringa and Mbeya indicate that these two regions have an upper bound, while the negative value for Rukwa indicates a lower bound. The value for Ruvuma is close to zero, indicating that the distribution for this region is approximately unbounded. Notably, the confidence interval for contains 0, indicating that a Gumbel distribution can also provide a better fit for the data. This finding is supported by the profile confidence interval for , which clearly shows that the value falls well within the interval for all stations. The values have small difference when L-moments and Bayesian MCMC were used. The MCMC estimation results revealed the existence of an upper finite endpoint to the distribution due to positive shape parameter. This estimation is in agreement with the MLE approach when used for Iringa station. The diagnostic plots for GEV estimation indicated in Figures 68 shows in all methods the empirical quantiles are linearly related with model quantiles. In both methods simulated data are right-distributed and affected by extreme values.

3.6. Trace Plots and Posterior Densities Analysis

Trace plots and posterior densities obtained from Bayesian MCMC analysis for the GEV distribution provide valuable insights into the parameters of extreme rainfall distribution in the regions. According to Figure 9, the trace plots show the convergence and mixing of the MCMC chains, indicating reliable parameter estimation. The absence of a trace plot for the shape parameter , indeed shows that the estimated shape parameter is close to zero. This indicates that the data may also be well-fitted by the Gumbel distribution, which is a special case of the GEV distribution with a shape parameter of zero. The reason behind this behavior is that the shape parameter determines the tail behavior of the distribution. When the estimated shape parameter is close to zero, it suggests that the tails of the data are not significantly heavy or light compared to the Gumbel distribution, which has a standard exponential tail. Therefore, the model is effectively converging towards the Gumbel distribution, and the trace plot for the shape parameter may not be displayed.

The posterior densities of the location parameter , and scale parameter provide information about the uncertainty associated with these parameters. Based on the trace plots and well-defined posterior densities, we can conclude that the MCMC algorithm has successfully explored the parameter space and converged to stable estimates for , , and . This indicates that the Bayesian MCMC method is effective in estimating the parameters of the GEV distribution for extreme rainfall in Iringa. Similary, fairly good trace plots and posterior densities for the GEV distribution parameters were obtained for other stations but not presented in this work.

3.7. Rainfall Return Levels for Different Return Periods

The return level estimates obtained using different methods provide important insights into the potential intensity of extreme rainfall events in Iringa. The 100-year return level estimated using maximum likelihood estimation (MLE), is projected to be of annual daily maximum rainfall, with a confidence interval ranging from to . This implies that, without intervention to mitigate climate change, Iringa could experience intense rainfall events of such magnitudes once every hundred years. The Bayesian MCMC method, which incorporates uncertainty quantification, provides a slightly lower estimate for the 100-year return level at , with a wider confidence interval ranging from to . This suggests that the Bayesian approach acknowledges the inherent uncertainties in the estimation process and presents a broader range of possible outcomes. Comparatively, the L-moments method estimates the 100-year return level at , with a narrower confidence interval ranging from to . These variations in estimates across methods highlight the importance of considering different approaches and their associated uncertainties in assessing extreme rainfall patterns. As the return period increases, both the estimated return levels and the width of the confidence intervals tend to increase, indicating a higher likelihood of more intense rainfall events occurring over longer periods. Table 3 presents the estimations of return levels for different return periods in Iringa. Similar return levels for different return periods were made for other stations but were not presented in this study. These findings provide valuable information for understanding the potential risks associated with extreme rainfall in the regions and can aid in decision-making for infrastructure planning and climate adaptation strategies.

3.8. Performance of MLE, L-Moments, and Bayesian Methods

Table 6 presents the performance comparison of Bayesian MCMC, MLE, and L-moments, in estimating the parameters of the GEV distribution. The evaluation is based on two metrics, MAE (mean absolute error) and RMSE (root mean square error). The results show that the Bayesian MCMC method achieved the lowest MAE (0.312) and RMSE (0.243) values compared to the other two methods. This indicates that the Bayesian MCMC method provides estimates that are closer to the true values of the GEV distribution parameters, resulting in smaller prediction errors on average. The MLE method obtained slightly higher MAE (0.327) and RMSE (0.268) values compared to the Bayesian MCMC method, indicating slightly larger prediction errors. On the other hand, the L-moments method yielded the highest MAE (0.342) but had a lower RMSE (0.253) compared to the MLE method. This suggests that the L-moments method has a higher average prediction error but exhibits less variability in the predictions.Based on these findings, we can conclude that the Bayesian MCMC method outperforms the MLE and L-moments methods in estimating the parameters of the GEV distribution for the given rainfall data. It provides more accurate and precise parameter estimates, resulting in better predictions of extreme rainfall events.

4. Discussion and Conclusion

The statistical analysis of yearly maximum daily rainfall at four stations revealed valuable insights into the rainfall patterns and characteristics in the Highlands Region. The results demonstrated significant differences in rainfall amounts among the stations, with Ruvuma receiving the highest amount and Rukwa experiencing the lowest. This suggests that the Ruvuma region may be more susceptible to flooding and related hazards, while Rukwa may face challenges related to water scarcity and drought during the dry seasons. The analysis of the annual maxima rainfall data using the Block Maxima method indicated a relatively stable pattern of extreme rainfall events over the past few decades, without a clear trend. These findings imply that the risk of flooding or other related hazards in these regions may remain constant over time. However, it is crucial to continue monitoring weather patterns and extreme events to ensure the implementation of appropriate measures for protecting people and infrastructure from the impacts of climate change.

The L-moments ratios provided further insights into the characteristics of extreme rainfall. The values of (mean rainfall depth) showed that Ruvuma had the highest average rainfall depth, while Rukwa had the lowest spread (scale) represented by . The kurtosis and skewness values indicated the heaviness of the tails and the departure from symmetry, respectively. Iringa exhibited a slightly right-skewed distribution and relatively high kurtosis. The values reflected the departure from symmetry in the upper tail, with Iringa having a thicker upper tail. Overall, the L-moments ratios suggested that the extreme rainfall in these regions had moderate peakedness and varying degrees of tail heaviness and asymmetry.

The L-moments ratio diagrams played a crucial role in identifying the best-fit distribution for annual extreme rainfall in each region. The analysis showed that the GEV distribution provided the best fit for extreme rainfall in all four stations, as indicated by Figure 5. The finding was further supported by the parameter estimation using maximum likelihood estimation (MLE), L-moments, and Bayesian Markov Chain Monte Carlo (MCMC) methods. This finding corroborates previous research by [17, 59] highlighting the effectiveness of the GEV distribution in characterizing extreme events.

The GEV parameter estimates revealed important characteristics of the rainfall distributions in the study area. The shape parameter values indicated whether the distributions had upper or lower bounds or were unbounded. Iringa and Mbeya exhibited positive shape parameters, suggesting an upper bound, while Rukwa had a negative shape parameter, indicating a lower bound. Ruvuma had a shape parameter close to zero, implying an approximately unbounded distribution. The confidence intervals for contained the value of 0, indicating that a Gumbel distribution could also provide a reasonable fit for the data. This finding is supported by the absence of a trace plot representing the shape parameter , indicating that the estimated shape parameter is near zero. This suggests that the data could be well-suited for fitting the Gumbel distribution, which is a special case of the GEV distribution with a shape parameter of zero. This behavior stems from the fact that the shape parameter influences the tail characteristics of the distribution. When the estimated shape parameter is close to zero, it implies that the data’s tails are not significantly heavier or lighter compared to the Gumbel distribution, which has a standard exponential tail.

Additionally, the estimation of return levels based on the GEV distribution provided valuable information on the magnitude and frequency of extreme rainfall events in the Highlands Region. The results of this study provide valuable insights into extreme rainfall patterns and their impacts in the Southern Highlands Region of Tanzania. However, it is crucial to acknowledge that each region may possess unique characteristics that influence the occurrence and consequences of extreme rainfall. As such, caution should be exercised when generalizing these findings to other locations. Nonetheless, the methodology and analysis employed in this study can be universally applied as a framework for deducing the best-fit distribution that characterizes extreme rainfall behaviors. By utilizing a rigorous statistical approach and considering relevant climatic factors, this study has contributed to understanding the underlying patterns of extreme rainfall events in Southern Highlands Region of Tanzania. The findings can serve as a basis for similar studies in different regions, adapting the methodology to the specific local context.

Furthermore, the identification of potential links between extreme rainfall and climate change in Tanzania highlights the need for further research and analysis. To achieve a more comprehensive understanding of the universal aspects of extreme rainfall patterns and their implications for climate change adaptation worldwide, it is essential to undertake broader studies encompassing multiple regions and considering a longer time span. Such studies could explore the influence of global climate change phenomena on regional extreme rainfall events, facilitating the development of more effective adaptation and mitigation strategies.

Data Availability

The corresponding author will provide data used in this research upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

The authors acknowledge the Tanzania Meteorological Authority (TMA) for providing the data used in this study. This research was financially facilitated by Mkwawa University College of Education (MUCE) through Higher Education for Economic Transformation (HEET) Project.