A Unit Probabilistic Model for Proportion and Asymmetric Data: Properties and Estimation Techniques with Application to Model Data from SC16 and P3 Algorithms

Eliwa, Mohamed S.; Ahsan-ul-Haq, Muhammad; Al-Bossly, Afrah; El-Morshedy, Mahmoud

doi:https://doi.org/10.1155/2022/9289721

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Supplementary Materials References Copyright Related Articles

Special Issue

Robust Statistical Modeling and Machine Learning with Applications in Data Science

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 9289721 | https://doi.org/10.1155/2022/9289721

A Unit Probabilistic Model for Proportion and Asymmetric Data: Properties and Estimation Techniques with Application to Model Data from SC16 and P3 Algorithms

Mohamed S. Eliwa,^1,2Muhammad Ahsan-ul-Haq,^3,4Afrah Al-Bossly,⁵and Mahmoud El-Morshedy^5,6

Academic Editor: Dost Muhammad Khan

Received11 Oct 2021

Accepted01 Feb 2022

Published11 Mar 2022

Abstract

In this study, a new one-parameter Log-XLindley distribution is proposed to analyze the proportion data. Some of its statistical and reliability properties, including moments with associated measures, hazard rate function, reversed hazard rate, stress strength reliability, and mean residual life function, are investigated in closed forms which help the researchers for modeling data in a small CPU time. It is found that the density function of the introduced distribution can be used as a statistical tool to model asymmetric data. Moreover, the failure rate function can be utilized to model different types of failures, including increasing, bathtub, and J-shaped. The model parameter is estimated using various estimation approaches to get the best estimator to help us in modeling the real data in a good way with high accuracy. A Monte-Carlo simulation study for different sample sizes is performed to assess the performance of the estimations based on some statistical criteria. Finally, two distinctive data sets from SC16 and P3 algorithms, “estimating unit capacity factors,” are analyzed to illustrate the flexibility of the new model.

1. Introduction

Researchers have demonstrated a strong interest in developing new extended distributions by adding shape parameters to baseline distributions during the last decade. The primary goal of this research is to improve the modeling abilities of distributions and provide new opportunities to model various data set features. Unbounded support has received a lot of attention from researchers. However, in many real-life circumstances, such as percentages and proportions, observations can only take values within a limited range [1].

Among the unit distributions, the beta distribution is the most well-known distribution. It is frequently utilized in different fields of research, such as economics, biology, and medical sciences. The major flaw of the beta distribution is that its cumulative distribution function (CDF) cannot be written in a closed explicit form. Thus, the researchers proposed and studied various unit distributions. Among the most useful unit distributions, there are the Johnson distribution [2], Topp–Leone distribution [3], unit-gamma distribution [4], Kumaraswamy distribution [5], log-Lindley distribution [6], unit-logistic distribution [7], unit-Birnbaum–Saunders distribution [8], log-xgamma distribution [9], unit-Lindley distribution [10], unit-Gompertz distribution [11], unit-inverse Gaussian distribution [12], unit-Bur III distribution [13], log-weighted exponential distribution [14], unit-Weibull distribution [15], unit-Modified Burr III distribution [16], unit-Rayleigh distribution [17], Frechet power function distribution [18], and unit-Burr XIII distribution [19].

The purpose of this work is to introduce a new distribution for modeling data sets on the unit interval. To achieve this aim, the XLindley distribution is used to construct a new model. The proposed distribution is entitled the Log-XLindley distribution with one positive shape parameter. The Log-XLindley distribution offers many benefits over well-known unit interval distributions, including the beta and Kumaraswamy distributions. The Log-XLindley distribution is better due to its simple structure and flexibility via hazard rate. The statistical properties, including moments, skewness, kurtosis, stress strength reliability, and mean residual life, maybe derived in explicit forms.

The paper is organized as follows: In Section 2, we propose the Log-XLindley distribution. The statistical characteristics are derived in Section 3. In Section 4, we derived the reliability properties of the proposed distribution. Different computation techniques are used to estimate the model parameter in Section 5. In Section 6, Monte-Carlo simulation analysis is carried out to assess the parameter estimation techniques’ finite sample performance. Two real data sets, “SC16 and P3 algorithms: estimating unit capacity factors,” are analyzed over the interval (0, 1) to show the Log-XLindley distribution’s flexibility in Section 7. Finally, some remarks are reported based on the proposed model in Section 8.

2. The Log-XLindley Distribution

A random variable is said to have the XLindley distribution with parameter shape if its density function can be expressed as

The Log-XLindley distribution is derived from the XLindley distribution using the logarithmic transformation of type where is the XLindley distribution. The probability density function (PDF) with support is given bywhere is the shape parameter. The corresponding CDF to (2) can be formulated as

For the Log-XLindley distribution, the limiting behavior of PDF at lower and upper limits is

Figure 1 illustrates some plots of the Log-XLindley model under some specific values of the parameter .

It is noted that the density function can be used as a probability tool to discuss and analyze asymmetric data. Moreover, the shape of the density function can be either decreasing or uni-modal, which makes the proposed model can be used effectively in modeling different types of data sets in various fields.

3. Statistical Properties

3.1. Moments and Incomplete Moments with Associated Measures

If the random variables have the Log-XLindley distribution with parameter , then the ordinary moments (OM) can be expressed in an explicit form as follows:After simple computations, the OM can be expressed as

The moments of the origin can be obtained by substituting respectively. The first four moments around the origin can be expressed as

Thus, the variance and index of dispersion of the proposed model can be formulated asrespectively. Based on the measure, the introduced model can be used to mode dispersion data. Moreover, the coefficient of skewness and kurtosis can be obtained by using well-known relations via the OM property. If the random variables have the Log-XLindley distribution with parameter , then the incomplete moments (ICM) can be expressed as

Using Equation (2), we get

After simple algebra, the ICM can be formulated as

The ICM can be used to measure inequality, including income quintiles, the Lorenz curve, Pietra, and Gini measures of inequality, among others.

4. Reliability Measures

4.1. Hazard (Reversed) Rate, Cumulative Hazard Function, and Mills Ratio

If the random variable has the Log-XLindley distribution, then the survival function (SF) and its hazard rate can be written, respectively, as follows:

Mathematically, the hazard rate function (HRF) of the proposed model is bathtub-shaped at , whereas the HRF can take J- and increasing-shaped at and , respectively. Figure 2 shows some plots of the HRF based on some determined values of the parameter .

Regarding Figure 2, it is noted that the HRF can take different shapes, including bathtub, J-shaped, and increasing. Thus, the HRF of the proposed model can be used to discuss various types of data in different fields. The reversed hazard rate is

The cumulative hazard and Mills ratio can be expressed, respectively, as

4.2. Stress-Strength Reliability (SSR)

Theorem 1. If the random variablesandhave the Log-XLindley distribution with parametersand, respectively, then the SSR can be formulated in an explicit form as

Proof. The SSR can be derived from the following relation:Using (2) and (3), the SSR can be written asAfter integration and simplification, the expression of the SSR can be expressed as (19).

4.3. Mean Residual Life (MRL)

If the random variable has the Log-XLindley distribution with parameter , then the MRL can be expressed in an explicit form as follows:Using (14), we getAfter simple modification, the MRL can be expressed as

Mathematically, the MRL of the new model is unimodal-shaped at , whereas the MRL can take inverse J- and decreasing-shaped at and , respectively.

5. Estimation Techniques

5.1. Maximum Likelihood Estimator

Let be a random sample from the Log-XLindley distribution; then, the log-likelihood function is given by

Taking the partial derivative to (25) with respect to the parameter , the following equation is obtained:

The maximum likelihood estimator of is derived by solving the nonlinear equation . The resulted equation cannot be solved without using a numerical approach like the Newton–Raphson method.

5.2. Least Squares (LS) and Weighted Least Squares (WLS) Estimators

Let be an ordered sample of size from the Log-XLindley distribution. Then, the LS estimator (LSE) of the Log-XLindley parameter can be derived by minimizingwith respect to the parameter . The WLS estimate (WLSE) of , say , can be determined by minimizingwith respect to .

5.3. Anderson–Darling (AD) and Right Tail Anderson–Darling (RAD) Estimators

The AD estimator (ADE) is a minimum distance-based estimator. It can be obtained by minimizingwith respect to the parameter , whereas the RAD estimator (RADE) of the model parameter can be derived by minimizingwith respect to the parameter

5.4. Cramer–Von Mises Estimator (CVME)

The CVME is a minimum distance-based estimator. The CVME of the Log-XLindley distribution can be obtained by minimizingwith respect to the parameter .

5.5. Maximum Product of Spacing Estimator (MPSE)

For , let be the uniform spacing of a random sample from the Log-XLindley distribution, and , , and . The MPSE of parameter , say , can be derived by maximizing the geometric mean of the spacings:with respect to the parameter .

6. Monte-Carlo Simulation

To assess the performance of the estimators listed in the previous section, we conducted a comprehensive simulation study. We used the Log-XLindley distribution to generate samples with , and and then calculated the average values (AVEs) of the estimators to get the mean square errors (MSEs), average absolute biases (ABBs), and mean relative errors (MREs) for , and . The ABBs, MREs, and MSEs are given by

We ran the simulation 5000 times to derive these metrics from the prior values for all estimation approaches. The findings in Tables 1–5 were reported utilizing the R software’s optim-CG function. The findings show that as the sample size increased, the AVEs became closer to the real values of . Furthermore, when increases, the ABBs, MREs, and MSEs for all estimators decreased. This proves that the previous estimation techniques work quite well in estimating the model parameter.

7. Data Analysis: Data of SC16 and P3 Algorithms

In this section, we consider two datasets to show the applicability and flexibility of the introduced distributions over famous distributions. Here, we compare the Log-XLindley model with some competitive models like the Kumaraswamy (Kw) and beta (B) distributions. The PDF of the competitive models can be formulated, respectively, as follows:

Some criteria like the Kolmogorov–Smirnov (KS) test with its P value have been used to get the best model among all the tested distributions. The following data are from [20, 21], where they compare the two different algorithms called SC16 and P3 for estimating unit capacity factors. The observations resulting from the algorithm SC16 are 0.853, 0.759, 0.866, 0.809, 0.717, 0.544, 0.492, 0.403, 0.344, 0.213, 0.116, 0.116, 0.092, 0.070, 0.059, 0.048, 0.036, 0.029, 0.021, 0.014, 0.011, 0.008, and 0.006. However, the observations resulting from the second algorithm named P3 are 0.853, 0.759, 0.874, 0.800, 0.716, 0.557, 0.503, 0.399, 0.334, 0.207, 0.118, 0.118, 0.097, 0.078, 0.067, 0.056, 0.044, 0.036, 0.026, 0.019, 0.014, and 0.010. Some descriptive measures about these data sets are presented in Table 6.

Regarding Table 6, both datasets are asymmetric “positively skewed” with platykurtic-shaped data. Moreover, the datasets are under dispersion where the value of the mean is less than the various values. Information about the failure rate can be helpful in the selection of the appropriate model. A gadget known as the total time on test (TTT) plot [22] can be used for this purpose. If the shape of the TTT plot is straight diagonal, the hazard is constant. The TTT plot has a convex shape for decreasing hazards and a concave shape for increasing hazards. The bathtub-shaped hazard is obtained when first is convex and then concave. The total test time (TTT) graphs for data sets I and II are shown in Figure 3. The hazard curves of both datasets are bathtub-shaped. Figure 4 shows the boxplots for data sets I and II, respectively. Therefore, Log-XLindley distribution can be good a choice to model these data sets.

The MLEs of the considered models along with their standard errors are given in Tables 7 and 8 for data sets I and II, respectively, with goodness-of-fit measures.

Regarding Tables 7 and 8, the proposed model is the best among all tested distributions. Figures 5 and 6 support our empirical results, which have been listed in Tables 7 and 8. The profile log-likelihood plots for both data sets are presented in Figure 7.

Since one of the major aims of this paper is to get the best estimators for the data sets I and II, several estimation techniques have been applied for this purpose. Tables 9 and 10 list the various estimators for data sets I and II based on different estimation approaches.

It is noted that all methods work quite well for analyzing SC16 and P3 algorithm data, but the OLSE and CVME are the best techniques for SC16 data, whereas the OLSE method is the best for P3 data.

8. Conclusion

In this paper, a flexible one-parameter Log-XLindley distribution has been proposed to analyze and discuss the proportion and asymmetric data. Some distributional properties have been derived in explicit forms. It was found that the hazard rate function can be applied to model different types of failures including increasing, bathtub, and J-shaped. The model parameter has been estimated using various estimation approaches to get the best estimator for data. A Monte-Carlo simulation study for different sample sizes has been performed to assess the performance of the estimations based on some statistical criteria. Finally, two distinctive data sets from SC16 and P3 algorithms have been analyzed to illustrate the flexibility of the new model, and it was found that the proposed distribution proved a remarkable superiority when compared to the competitive models.

Data Availability

Data are included in the manuscript.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Supplementary Materials

The R code used in the study can be assessed from the attached file. (Supplementary Materials)

References

L. E. Papke and J. M. Wooldridge, “Econometric methods for fractional response variables with an application to 401 (k) plan participation rates,” Journal of Applied Econometrics, vol. 11, no. 6, pp. 619–632, 1996.
View at: Publisher Site | Google Scholar
N. L. Johnson, “Systems of frequency curves generated by methods of translation,” Biometrika, vol. 36, no. 1/2, pp. 149–176, 1949.
View at: Publisher Site | Google Scholar
C. W. Topp and F. C. Leone, “A family of J-shaped frequency functions,” Journal of the American Statistical Association, vol. 50, no. 269, pp. 209–219, 1955.
View at: Publisher Site | Google Scholar
A. Grassia, “On a family of distributions with argument between 0 and 1 obtained by transformation of the gamma and derived compound distributions,” Australian Journal of Statistics, vol. 19, no. 2, pp. 108–114, 1977.
View at: Publisher Site | Google Scholar
P. Kumaraswamy, “A generalized probability density function for double-bounded random processes,” Journal of Hydrology, vol. 46, no. 1–2, pp. 79–88, 1980.
View at: Publisher Site | Google Scholar
E. Gómez-Déniz, M. A. Sordo, and E. Calderín-Ojeda, “The Log-Lindley distribution as an alternative to the beta regression model with applications in insurance,” Insurance: Mathematics and Economics, vol. 54, no. 1, pp. 49–57, 2014.
View at: Publisher Site | Google Scholar
A. F. B. Menezes, J. Mazucheli, and S. Dey, “The unit-logistic distribution: different methods of estimation,” Pesquisa Operacional, vol. 38, no. 3, pp. 555–578, 2018.
View at: Publisher Site | Google Scholar
J. Mazucheli, A. F. B. Menezes, A. F. B. Menezes, S. Dey, A. F. B. Menezes, and S. Dey, “The unit-Birnbaum-Saunders distribution with applications,” Chilean Journal of Statistics, vol. 9, no. 1, pp. 47–57, 2018, http://www.soche.cl/chjs%0Ahttp://chjs.mat.utfsm.cl/volumes/09/01/Mazucheli_etal(2018).pdf.
View at: Google Scholar
E. Altun, “The log-xgamma distribution with inference and application,” Journal de la Societe de Statistique de Paris, vol. 159, no. 3, pp. 40–55, 2018.
View at: Google Scholar
J. Mazucheli, A. F. B. Menezes, and S. Chakraborty, “On the one parameter unit-Lindley distribution and its associated regression model for proportion data,” Journal of Applied Statistics, vol. 46, no. 4, pp. 700–714, 2019.
View at: Publisher Site | Google Scholar
J. Mazucheli, A. F. Maringa, and S. Dey, “Unit-Gompertz distribution with applications,” Statistica, vol. 79, no. 1, pp. 25–43, 2019.
View at: Publisher Site | Google Scholar
M. E. Ghitany, J. Mazucheli, A. F. B. Menezes, and F. Alqallaf, “The unit-inverse gaussian distribution: a new alternative to two-parameter distributions on the unit interval,” Communications in Statistics - Theory and Methods, vol. 48, no. 14, pp. 3423–3438, 2019.
View at: Publisher Site | Google Scholar
K. Modi and V. Gill, “Unit burr-III distribution with application,” Journal of Statistics & Management Systems, vol. 23, pp. 1–14, 2019.
View at: Publisher Site | Google Scholar
E. Altun, “The log-weighted exponential regression model: alternative to the beta regression model,” Communications in Statistics - Theory and Methods, vol. 50, no. 10, pp. 2306–2321, 2019.
View at: Publisher Site | Google Scholar
J. Mazucheli, A. F. B. Menezes, L. B. Fernandes, R. P. de Oliveira, and M. E. Ghitany, “The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the modeling of quantiles conditional on covariates,” Journal of Applied Statistics, vol. 47, no. 6, pp. 954–974, 2020.
View at: Publisher Site | Google Scholar
M. Ahsan-ul-Haq, S. Hashmi, K. Aidi, P. L. Ramos, and F. Louzada, “Unit modified burr-III distribution: estimation, characterizations and validation test,” Annals of Data Science, vol. 99, 2020.
View at: Publisher Site | Google Scholar
R. A. R. Bantan, C. Chesneau, F. Jamal et al., “Some new facts about the unit-Rayleigh distribution with applications,” Mathematics, vol. 8, no. 11, pp. 1–23, 2020.
View at: Publisher Site | Google Scholar
M. Ahsan-ul-Haq, M. Albassam, M. Aslam, and S. Hashmi, “Statistical inferences on odd fréchet power function distribution,” Journal of Reliability and Statistical Studies, vol. 14, no. 1, pp. 141–172, 2021.
View at: Publisher Site | Google Scholar
M. Korkmaz and C. Chesneau, “On the unit Burr-XII distribution with the quantile regression modeling and applications,” Computational and Applied Mathematics, vol. 40, no. 1, pp. 1–26, 2021.
View at: Publisher Site | Google Scholar
M. Caramanis, J. Stremel, W. Fleck, and S. Daniel, “Probabilistic production costing: an investigation of alternative algorithms,” International Journal of Electrical Power & Energy Systems, vol. 5, no. 2, pp. 75–86, 1983.
View at: Publisher Site | Google Scholar
M. Mazumdar and D. P. Gaver, “On the computation of power-generating system reliability indexes,” Technometrics, vol. 26, no. 2, pp. 173–185, 1984.
View at: Publisher Site | Google Scholar
M. V. Aarset, “How to identify a bathtub hazard rate,” IEEE Transactions on Reliability, vol. R-36, no. 1, pp. 106–108, 1987.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Mohamed S. Eliwa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies