Abstract

Rainfall intensity prediction or forecast is vital in designing hydraulic structures and flood and erosion control structures. In this work, meteorological data were obtained from the National Aeronautics and Space Administration’s (NASA) website. Models estimating maximum rainfall intensities were derived, and some meteorological factors’ effects on the models were tested. The meteorological factors considered include annual relative humidity averages, specific humidity, temperature range at 2 m, maximum temperature, and minimum temperature. This research was aimed at developing a model for estimating maximum rainfall intensities, and the effects of various meteorological factors on the models were investigated. The exponentiated standardized half logistic distribution (ESLD) was used to model the effects of the factors and return periods on 35 years’ (1984–2018) annual maxima monthly rainfall intensities for Port Harcourt metropolis, Nigeria. The model parameters were estimated using the maximum likelihood estimation method. Compared with the results from the five standard distributions, three criteria were used to determine the best-performed distribution. These indicated that the ESLD performed considerably better than the other five compared distributions. Only the return period had significant effects on the model for the rainfall intensity prediction since , while the effects of the meteorological factors are insignificant.

1. Introduction

Different meteorological conditions impact precipitation in three ways: (1) how intense it is, (2) how large it is, and (3) how long it lasts. The parameters such as minimum and maximum temperatures, specific humidity, and relative humidity may affect climatic conditions. Since the 1960s, the public has focused attention on how rapidly industrial and agricultural expansion is taking place and other things such as changes in climate and precipitation, climate change, and human activities on water resources management [1]. The conclusion to be drawn from this research is that it is reasonable to say that precipitation and temperature are highly relevant factors in investigations related to climate change and hydrologic cycle change [2].

Extreme value studies are critical for engineering applications in determining extreme occurrences or loads [3]. Examples of such events include the yearly maximum wind speed, the annual maximum daily streamflow, the annual maximum rainfall and runoff, and the annual maximum seismic motions [3]. Various fields of study, including engineering, hydrology, water resources, water quality and quantity modelling, flood monitoring, drought forecast, soil erosion estimation, and a slew of others, benefit from analyzing rainfall data [4]. An in-depth understanding of the rainfall’s spatial and temporal distribution pattern is essential for practical water resource evaluation, planning, and management, as well as for agricultural planning and production [5, 6]. In addition, determining the probability and frequency of precipitation can assist policymakers in predicting floods, warning people about floods, and investigating the consequences of floods (Mysski et al., 2019). Rainfall is a random occurrence that carries with it a certain amount of uncertainty. As a result, estimating rainfall is challenging due to its temporal and spatial unpredictability; yet, forecasting or reanalysis of rainfall is achievable through the use of historical meteorological data [5, 7]. As a result, the necessity of probabilistic analyses and explanations is justified. In this context, probabilistic analyses include probabilistic plotting positions, which estimate the frequencies and likelihoods that rainfall intensities will occur [8], and probability distributions, which are used to estimate reanalysis of forecast rainfall intensities [8]. [7].

Frequency analysis is done by analytically fitting selected probability distribution to data. It might involve graphical plotting for distribution evaluation and outliers’ detection [9, 10]. Several distributions exist and are still being developed. However, choosing the best and appropriate distribution for hydrological data has always remained a subject of research in hydrology [9]. Plotting positions are estimated as probabilities of cumulative density function [11]. When probability distributions are being fitted to ranked data, several methods presently exist for positioning the ranked data on the cumulative distribution axis based on the distribution being considered [3]. Most probability distributions can be modified for random hydrologic variables; however, choosing the best distribution for hydrologic analysis is a herculean task [12].

Several researchers in the past had concentrated efforts on using probability distributions to analyze long-term precipitation data for numerous areas of the world. Beskow et al. [13] compared kappa and generalized extreme value multiparameter distributions (GEV) with the Gumbel and log-normal two-parameter distributions using Filliben, Anderson-Darling Kolmogorov-Smirnov, and chi-square tests for extreme rainfall in the Rio Grande do Sul state, Brazil. The researchers posited that kappa and GEV distributions gave the best performance. These implied that multiparameter probability distributions have better performances than 2-parameter distributions for precipitation data in the Brazilian state. Moccia et al. [14] investigated the best fit probability distribution for two different climates in Italy by comparing light-tailed and heavy-tailed distributions. The Kolmogorov-Smirnov and ratio mean square error tests were used to assess the performances of the distributions. It was concluded from the study that heavy-tailed distributions give a better description of empirical data than light-tailed distributions. Amin et al. [15] also analyzed annual maximum 24-hour rainfall for different rainfall gauging stations in the northern region of Pakistan using four different probability distributions. The researchers tested log Pearson type III, log normal, the normal, and Gumbel maximum distributions. Based on the goodness of fit, the normal and log Pearson type III distributions had the best performances for different rainfall gauging stations.

Despite these enormous studies, there is still a dearth of research on the existing relationships between annual maximum rainfall and other meteorological factors such as average annual relative humidity, average specific humidity, average maximum temperature, average minimum temperature, average range temperature, and average temperature (both at 2 m above the ground surface). In this research, the exponentiated half logistic is presented as an alternative distribution in modelling rainfall intensities. Furthermore, the contributions of some meteorological factors to maximum rainfall intensities were assessed. In conjunction with the frequency of annual maximum monthly rainfall of the Port Harcourt (PH) metropolis, these factors were subjected to six (6) probability distributions. This comparison was made between the exponentiated standardized half logistic distribution and existing previously tested literature distributions. One of the appeals of the exponentiated half logistic distribution is its flexibility in fitting datasets due to the exponential power parameter. This parameter is the shape parameter that helps the distribution create shapes likened to the shape of the density of the dataset under study.

The best fit distribution was selected using the Akaike information criterion (AIC), Bayesian information criterion (BIC), and the corrected Akaike information criterion (CAIC). Therefore, this study was aimed at validating the effects of some meteorological factors on 35 years’ (1984-2018) annual maximum rainfall for PH metropolis using best fit probability distribution. This study will enable a more accurate prediction of annual maximum rainfall from a combination of factors when their resultant effects are ascertained. This study will also aid flood prediction, estimation, and warning in South southern Nigeria and Port Harcourt. It will be handy for water resources management, planning, evaluation, and water planning for agricultural purposes. Therefore, this research was aimed at developing a model for estimating maximum rainfall intensities, and the effects of various meteorological factors on the models were investigated.

1.1. The Exponentiated Standardized Half Logistic Distributions

The half logistic distribution, sometimes called the folded distribution, is derived from the logistic distribution by truncating . In research, the half logistic distribution is gradually gaining attention via generalizations and applications to real-life situations. Examples of generalizations of the half logistic distribution include Awodutire et al. [16], Awodutire et al. [17], Olapade [18], Bello et al. [19], Jose and Manoharan [20], Usman et al. [21], and Cordeiro et al. [22] among others. Few works have been done in applying these generalized forms of the half logistic distribution to study real-life situations. Awodutire et al. [23] applied the work of Olapade [18] to breast cancer survival in Nigeria.

An example of a generalized half logistic distribution is the exponentiated standardized half logistic distribution, a submodel of Cordeiro et al. [22]. Cordeiro et al. [22] become the exponentiated standardized half logistic distribution (ESLD) when the shape parameter is equal to 1. Therefore, the ESLD has the probability density function (PDF) and cumulative density function (CDF) shown in equations (1) and (2), respectively. where and are the shape parameter. From equations (1) and (2), the survival and hazard functions are given as equations (3) and (4), respectively.

Figures 1 and 2 show the plots of the PDF and hazard with different parameter values. Estimation of the parameter of the distribution with/without covariates is presented in Appendices A and B.

In applications to many real-life situations, there is a need to predict or assess the contributions of some explanatory variables (covariates) to life events. Therefore, a regression model is required which fits lifetime data better, yielding estimates of interest that are consistent and efficient. This regression model is of the form where is the lifetime event, is the set of explanatory variables (covariates), is the set of coefficients of the explanatory variables, and is the error term that is distributed to the standardized half logistic distribution.

The ESLD regression model (in the presence of covariates) was estimated from the maximum likelihood estimation method (shown in the Appendix).

1.2. Study Area

PH, the capital of Rivers State, is located in Nigeria’s South Southern geopolitical zone at latitude 4.810 N and longitude 7.040 E. An average population of about 2 million people lives in the city. PH experiences an average rainfall of about 2418 mm annually. The months of June to October usually record the highest rainfall amount. These range between 277 and 790 mm annually. PH has average daily maximum temperature, average daily relative humidity, and specific humidity of 29°C, 87, and 18, respectively. These annual heavy rainfall amounts make the city susceptible to annual flooding, waterlogging of supposedly arable land, and high destruction of city infrastructures such as roads and water drainages. Neighbouring towns such as Onne, Ogoni, Buguma, Oduoha, and Emuoha surround PH city. The map of PH is shown on Figure 3.

2. Materials and Methods

2.1. Frequency Analysis

Time series of rainfall records were obtained for PH metropolis. The obtained data were daily rainfall spanning over 35 years in millimetres (1984–2018). Daily meteorological data such as relative humidity, specific humidity, maximum temperature, minimum temperature, the temperature at 2 m, and temperature at 2 m above the ground surface were also obtained. The aggregate sum of daily rainfall for each month is computed, and the month with the highest aggregate was chosen to represent the particular year. This process amounted to 35 yield maxima (one month per year). The maximum monthly rainfall data were used as the parameter for the estimation of extreme values. Maximum monthly rainfall commonly occurred between June and October annually. The frequencies or return periods were determined from six (6) plotting position methods. Hazen, California, Weibull, Chegodayev, Blom, and Gringoten’s plotting positions were tested, and their respective details are shown in Table 1. The best plotting position was employed to analyze rainfall intensities from return periods further. Annual averages of other meteorological data considered were estimated from their monthly sums.

2.2. Probability Distributions

The exponentiated standardized half logistic distribution (ESLD) was applied to maximum rainfall intensities. ESLD was compared with five other distributions, which include exponential (E), Weibull (W), Pearson type III (P), generalized extreme value (GEV), and Rayleigh (R) distributions. The PDFs of these distributions alongside their respective CDFs are presented in Table 2.

2.3. Regression Analysis

In regression analysis, the dependent variable and one or more independent variables are looked for a relationship. It is an indispensable statistical resource that is widely used in all scientific disciplines. This type of statistical tool is primarily employed in business and economics, where it is utilized to analyze the causal connection between two or more variables. A hypothetical model is created to test relationships, and parameters are estimated to provide an estimated regression equation. In this study, a linear regression model with a logarithm term is fitted to the meteorological factors to see a statistically significant relationship with time. The regression model is of the form where is the maximum rainfall intensities, is the intercept, are the coefficients of the covariates of the meteorological factors, and is the random error.

In this research, the ESLD regression model was considered. Estimation of the parameters of the regression model was done using the maximum likelihood estimation method. Details of this are in the Appendix.

Three criteria were considered for the comparison of the models. These are AIC, BIC, and CAIC. Their respective mathematical expressions are given as equations (7) through (9). where represents the number of observations, represents the model parameters, and is the maximized value of the likelihood function of the model. The models with the lowest AIC, BIC, and CAIC indicate the model with the best fit.

3. Results and Discussion

The rainfall in PH usually begins properly in March and stops in October, while the dry season begins from November to the next February. However, there are sometimes pockets of light rainfall events of minimal depths during the dry season. The rainy season (for the 35 years (1984-2018) period considered) contributes an average of 89.5% of the total rainfall amounts, while the remaining 10.5% were experienced during the dry season. The dry season in PH city often experiences low humidity with minimal or low rainfall. The dry periods often last for about 100 to 120 days (4 months).

The highest monthly rainfall depth (784.9 mm) for PH metropolis was experienced in August 2006 and declined to 227 mm in September 2014 (the highest for that year), as shown in Table 3. Furthermore, Table 3 shows that the first quartile, median, mean, and third quartiles for the 35 annual maxima are 392.1 mm, 470 mm, 498 mm, and 626.2 mm, respectively. The differences in rainfall depth can be due to geographical locations, elevations, environmental factors [25], uncertainties, and randomness of rainfall events which are naturally controlled processes. The mean annual rainfall for the years 1984 and 2018 is 182 mm and 362 mm, respectively. The minimum and maximum mean annual mean rainfall depths are 292 mm and 117 mm for 2007 and 2014. Likewise, the minimum and maximum annual standard deviations are 259 mm and 81.88 mm for 2007 and 2014, respectively. The coefficient of variation (CV) ranged from 59% in 2003 to 122% in 2005, indicating high rainfall variability [5]. The skewness coefficient (CS), which measures the skewness of the data distribution, ranges from -0.31 (1990) to 2.51 (2005). Three years have negative CS, while 32 years have positive CS, which indicates rightly skewed data distribution. The histogram plots of the 35 annual maxima used for this study are shown in Figure 4. The mean rainfall during the study period is 201.48 mm, standard deviation 162.33 mm, CV 81%, and CS 0.97.

Figures 510 show the plots of the 35 years monthly annual maxima against estimated return periods from the respective six (6) probability plotting position methods considered for this study. Linear relationships between rainfall intensities and return period resulted in determination coefficients ranging from 0.82 to 0.85 for all plotting positions. The quadratic relationships, however, had higher coefficients and thus described the relation better than linear models. The probability positions all fitted the rainfall records perfectly as determined from the determination coefficients, ranging from 0.975 to 0.984 obtained from the quadratic relationships. However, the Hazen and California’s plotting positions fit the data best with correlation coefficients 0.984. Thus, peak rainfall runoff can be predicted, forecast, or reanalyzed for the known return period () from equation (10). The finding established the relationship between rainfall intensities and return periods for the study area.

In order to ascertain the effects of other meteorological factors on peak annual rainfall, factors which include relative humidity, specific humidity, maximum temperature, minimum temperature, the temperature at 2 m, and temperature at 2 m above the ground surface were subjected to analysis using six (6) different probability distributions. The maximum likelihood estimates of the ESLD and five other compared probability distribution parameters are shown on Table 4. These verify (or otherwise) the effects of these factors on rainfall intensities. Figure 11 shows the fitted plots of different distributions used to model rainfall intensities and other meteorological factors. The curve for ESLD covers the area of the plots best compared to other distributions, which indicates that the ESLD best fits the data compared to commonly used distributions.

Furthermore, the AIC, BIC, and CAIC were used to compare the models, in which the results as in Table 4 show that E, W, P, R, and GEV performed less than ESLD. E had the worst performance with AIC (197.46), BIC (209.90), and CAIC (203). The ESLD is ranked best among the distributions due to its lowest AIC, BIC, and CAIC. The AIC, BIC, and CAIC for ELSD are 137.29, 150.73, and 143.83, respectively. GEV closely followed ESLD with AIC (137.71), BIC (153.27), and CIAC (146.88).

Having established ESLD as a perfect alternative distribution in fitting rainfall intensities and other meteorological datasets, the ESLD regression model assessed the contributions of meteorological factors to rainfall intensities. This assessment was best because ESLD had the lowest comparison criteria values to maximum rainfall intensities. The results of the analysis are revealed in Table 5. None of the variables had a significant effect on rainfall intensity at 0.05 level of significance.

Several pieces of research attested to the performance of GEV when compared to other distributions for most watersheds. Coronado-Hernández et al. [26] reported GEV as the best fit probability distribution for frequency analysis of maximum daily rainfall for a series of return periods for selected precipitation recording stations in Colombia and also reported GEV to be the best fit for most of the rain-gauge stations considered in Bangladesh. Likewise, Kumar et al. [27] gave credence to GEV as the best-performed distribution for most of the rainfall period considered for Haryana, India. Furthermore, Młyński et al. [12] recommended GEV distribution to predict maximum daily rainfall with the specific probability of exceedance for catchments in the upper Vistula basin of Poland. Beskow et al. [13] opined that the kappa and GEV multiparameter distributions performed two-parameter distributions for modelling extreme rainfall events for basins in southern Brazil. This research, however, presents ELSD as a better distribution for modelling extreme rainfall events in PH. All meteorological factors considered have no significant effect on the prediction of rainfall from probability distribution models.

4. Conclusion

Compared with the other five probability distributions, ESLD was used to model the effects of frequency and other meteorological factors on 35 years’ monthly rainfall maxima for PH metropolis, Rivers State, Nigeria. The study also compared six probability position plotting methods but found out the Chegodayev plotting position method performed best with (0.967). The five distributions include E, W, P, R and GEV. The maximum likelihood method was used to estimate the distributions’ parameters, and comparisons were made using three criteria, thus selecting the best fit distribution based on the lowest criteria values. The exponential distribution had a minor performance based on the criteria values. ESLD performed better than all compared distribution because it has the highest 2 of 3 criteria values. The performance of the ESLD model was closely followed by GEV, a widespread distribution known for best performance for varying basins in literature. ESLD and GEV were followed by Weibull (W) distribution for the prediction of maximum rainfall intensity in PH. These results presented in this study can be handy for rainfall modelling in PH. The findings of this study will aid flood management and control measures and soil and agricultural drainage designs to cob the menace of waterlogging on arable lands and residential areas in Port Harcourt metropolis, Rivers State, Nigeria.

Appendix

A. Maximum Likelihood Estimation without Covariates

Let be a random sample from the exponentiated standardized half logistic distribution with values and parameter , and then, the log-likelihood function of exponentiated standardized half logistic distribution is obtained as

Differentiating l with respect to gives

The expression in 2 to zero to obtain the estimates of gives

B. Maximum Likelihood Estimation with Covariates

Let be a random sample from the exponentiated standardized half logistic distribution with observed values with covariates having coefficients , , and parameter , and then, the log-likelihood function of exponentiated standardized half logistic distribution is obtained as

where = . The expression in equation (B.1) is not tractable for . Therefore, the use of computer programs is employed in estimating the values of the s.

Data Availability

The data is available using the following link: https://drive.google.com/file/d/1RKeYC1h0TuJRa-FpceOgVVkzS2Bh1zOn/view?usp=sharing.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This manuscript is funded by the Digiteknologian TKI-ymparisto project A74338 (ERDF, Regional Council of Pohjois-Savo).