Abstract

In real-life situations, censoring issues do arise due to the incompleteness of data. This article examined the inferences on right-censored beta type I generalized half logistic distribution. In this work, some statistical properties of the beta type I generalized half logistic distribution were derived. Furthermore, the beta type I generalized half logistic distribution was studied under a censoring situation in the presence and absence of covariates. Estimation of model parameters was conducted using the maximum likelihood estimation method. A simulation study was carried out to assess the performance of the parameters of the model in terms of efficiency and consistency. In a real-life application, the model was applied to COVID-19 data and the necessary inferences were drawn.

1. Introduction

In real-life experiments, the issue of censoring is experienced. This occurs when the subject under study is lost to follow up, survived beyond the given time of the study, or dropped out. In medical research studies, for example, to know the survival times, the first date of contact and last date of contacts are known and recorded. Censoring can be right censoring, left censoring, or interval censoring. Given T as the time of occurrence for some event and as a given value, it is right-censoring if variable T is greater than some value “c”. Left censoring is most likely to occur when you begin observing a sample at a time when some of the individuals may have already experienced the event. It is an interval if time T falls between two values of c. Left censoring is most likely to occur when you begin observing a sample at a time when some of the individuals may have already experienced the event. It is interval if time T falls between two values of c.

A lot of works have been done, assessing probability distributions in the presence of right censoring conditions. Some of the recent works include [16], etc. For this research, the beta type I generalized half logistic distribution is studied under a right-censoring mechanism.

One of the probability distributions, which is a member of the logistic distribution, is the half logistic distribution. The half logistic distribution which is also known as folded logistic is derived from logistic distribution by truncating at point x = 0. The half logistic distribution has the probability distribution as follows:

and its cumulative distribution function (cdf) is as follows:

The author of [7] in his research stated some theorems that characterized the half logistic distribution. The author of [8] derived a new probability density function of type I generalized half logistic distribution by addition of a shape parameter to half logistic distribution. He obtained the moments, median, cumulative distribution function, mode, 100p-percentage point, and order statistics of the distribution. He further estimated parameters of the distribution using the maximum likelihood method. Therefore,

Introducing the scale parameters , equation (3) becomes

The type I generalized half logistic distribution by [8] is gradually gaining attention in research studies. It has been further generalized by adding parameters. These generalized distributions have been extensively studied and applied to solve real life situations both under complete and censored observations. The author of [9] obtained the four-parameter type I generalized half logistic distribution. He also obtained the cumulative distribution function (CDF), the survival function, and the hazard function, moments, the 100p-percentage point, and the mode of the distribution. The author of [10] introduced the distribution of [8] to survival analysis. The properties of the survival model such as survival function, hazard function were studied and applied to data on breast cancer patients. The author of [11] derived a parametric survival model for breast cancer patient survival data using the survival model obtained by [10]. The author of [12] further generalized the four-parameter type I generalized half logistic distribution obtained by [9] by the addition of a shifting parameter to have the five parameters generalized half logistic distribution. A further study on some of the properties and estimation of the parameter of distribution of under complete observation was carried out by [13]. The author of [14] extended the distribution of [8] by addition of a parameter to have the Lehmann type II generalized half logistic distribution. Due to the nondecreasing properties of the hazard function of the type I generalized half logistic distribution, The author of [15] derived the BTIGHLD that is not limited to only nondecreasing hazards but can take any other form of hazard function. Its properties were studied and estimation of parameters was done under complete observations. The distribution was applied to two real life data sets. The BTIGHLD has the probability density function as follows:with CDF as follows:

Equation (6) therefore results to the following:where the and B(k; a, b) is the incomplete beta function.

The survival function and hazard function is as follows:andrespectively, wherefor , and .

In literature, several studies have been conducted to examine other generalizations of the half logistic distribution under incomplete data situations. Some of these include [416]. The author of [16] obtained an extended generalized half logistic distribution and studied different methods of estimation of its parameters based on censored and complete data, In this paper, we examined some statistical properties of the beta type I generalized half logistic and study its inferences under right-censored observations. The work further applied the model to the survival time of COVID-19 patients in a bid to assess some factors that contribute to their survival. A preprint of this work has been previously published in [17].

In the next section, we present some properties of BTIGHLD. In Section 3, we estimated the parameters of the distribution under censored observations without covariates, and Section 4 deals with estimation with covariates. Section 5 has a simulation study of the model and assesses the performance of parameters of the model. In Section 6, the model is applied to COVID-19 data.

2. Statistical Properties of BTIGHLD

In this section, further study is carried out on some properties of type I generalized half logistic distribution. The properties under study include moments, moment generating function, incomplete moments, and order statistics. For simplicity in formulae which can be easily handled for computer software because of their ability to deal with analytic expressions of formidable size and complexity, established explicit expressions to calculate statistical measures can be more efficient than computing them directly by numerical integration. Therefore, the need to also compute the mixture representation of the PDF of the BTIGHLD by having the Exp-G form which will enable us in studying the statistical properties of the.

2.1. Mixture Representation

By using the power series,

Therefore, the p.d.f of BTIGHLD becomes as follows:

Therefore, the expression in (12) can be expressed as a mixture of the exponentiated-type I generalized half logistic distribution (Lehmann type II generalized half logistic distribution) as follows:whereand is the Exp-G PDF form of the distribution. Integrating expression in 11 gives the mixture of the Exp-G CDF as follows:where

Further simplifying expression in 10, using the binomial expression,

Therefore, expression 10 becomes as follows:which finally gives

(20) can be written as follows:whereand is the p.d.f of the type I generalized half logistic distribution with power parameter (q(j+1)).

2.2. Moments and Moment Generating Function

Given that X is a random variable with BTIGHLD distribution, its moment is given as follows:where is the moment of the type I generalized half logistic distribution with power parameter (q(j + 1)). From the expression of the moments, we obtain the second moment, skewness, and kurtosis.

For the moment generating function, say, ,where is the moment generating function of the type I generalized half logistic distribution with power parameter q(j + 1).

Tables 1 and 2reveal the values of the moments of the BTIGHLD for different parameter values. The tables show positive values for the mean, the second moment, and the variance, indicating its ability to handle real life situations which are most times positive.

2.3. Order Statistics

Given that be the order statistics from a distribution with probability density function f(x) and cumulative density function F(x), then the PDF of , which is the order statistics, is given as follows:where B(..) is a beta function.

Introducing 11 and 12 as f(x) and F(x), respectively, using an equation given in page 17 of Gradshteyn and Ryzhik for a power series raised to a positive integer n we have the order statistics of the BTIGHLD as follows:where

Therefore, we can deduce that the p.d.f of BTIGHLD order statistics is a mixture of Exp-G p.d.fs. Hence, the properties of follows from properties of a + k.

3. Estimation of BTIGHLD under Right-Censored Observation without Covariates

Let be random variables of size n follows a particular probability distribution, the likelihood function under censored observation is as follows:where represents the joint probability of observing the uncensored survival times and represents the joint probability of those censored observations.

Therefore, given a sample X, X, X, …, of size n from the BTIGHLD, the likelihood function under censored observation without considering the covariates is given as follows:where

From (30), taking the logarithm of both sides, we havewhich gives

Differentiating l with respect to the parameters, we have

Expressions for the derivatives of the incomplete beta function with respect to the model parameter can be found in Appendix. In obtaining the interval estimation, and to further carry out the test of hypothesis on model parameters , we obtain a 4 × 4 unit information matrix, with the elements obtained by the second derivates of the loglikelihood function.

If the assumptions that are set for fulfilment in model BTIGHLD, the asymptotic distribution in is distribution of .This found use in way of constructing confidence intervals with its respective region of confidence with each of the distribution values. In obtaining or assessing the goodness of fit of BTIGHLD, we can also make use of asymptotic normality which will also be useful in comparison procedure with other distribution of sub-models using the Wald Statistics or Likelihood statistics. If is the model parameter of BTIGHLD, we obtain the C.I for level of significance given by the following:where is the diagonal element of for and is the quantile of the standard normal distribution.

4. Estimation of BTIGHLD under Censored Observation with Covariates

In this section, we examined the estimation of the BTIGHLD under right censored with covariates. The likelihood function is given as follows:which is equivalent towhere and . Also and are the baseline probability density function and baseline survival function, respectively (Appendix).

This now giveswhere

From (38), taking the logarithm of both sides, we have

Differentiating with respect to the parameters, we have

For interval estimation and test of hypothesis on the parameters , we obtain a (4 + j)x(4 + j) unit information matrix where the corresponding elements are obtained by taking the second derivatives with respect to the parameters.

Under conditions that are fulfilled for parameters, the asymptotic distribution of is distribution of can be used to construct approximate confidence intervals and confidence regions for the parameters and for the hazard and survival functions. This found use in way of constructing confidence intervals with its respective region of confidence with each of the distribution values. In obtaining or assessing the goodness of fit of BTIGHLD, we can also make use of asymptotic normality which will also be useful in comparison procedure with other distribution of sub-models using the Wald Statistics or Likelihood statistics. If is the model parameter of BTIGHLD, we obtain the C.I for level of significance given by the following:where is the diagonal element of for and is the quantile of the standard normal distribution.

5. Simulation Study

In this section, a simulation will be carried out in order to assess the performance of the estimates using the maximum likelihood estimation method of the BTIGHLD. The value of scale parameter is fixed at 1 for all simulations. For, given values of a, b, q for Group I(2.5, 0.5, 3.0), Group II (0.35, 5.0, 1.8), we simulated data that are distributed to BTIGHLD. In assessing the performance of the estimates, the average estimate (AE) and MSE of over the r samples are given, respectively, by the following:

This simulation was conducted for n = 50, 100, 200, and 500 and censor rate at 20% and 80%

The simulation procedure is as follows;(1)We generate a random sample of size n = 50,100,200,500 from the beta distribution .(2)From (1), the observations following the BTIGHLD are as follows:(3)Generate the simple sample of the censoring times from a BTIGHLD and adjust the parameters of the BTIGHLD to obtain the desired censoring rates.(4)To get the right censored data, this is obtained from the minimum value of censoring time and survival time, that is,(5)The observed data set is .(6)Replicate the values from (5) r-times for the different sample sizes to be considered. For this simulation, we take r = 1,000.(7)Based on the dataset from (6), we can get the maximum likelihood estimates of the parameters.(8)The likelihood function of the model is maximized with respect to parameters to obtain .(9).If is a MLE of , such that l = 1, 2, 3, 4 (i.e. , , , ), based on sample sizes k, . All simulation results are summarized in Tables 3 and 4.

Clearly, from Tables 3 and 4, they show that the values of the average estimates decrease as the sample size increases for all parameters across the two censoring proportions. It indicates that MSE reduces as samples size increases. This indicated efficiency and consistency of the estimates of the parameters.

6. Application of the BTIGHLD to COVID-19 Data

In this section, we considered a data that were culled from information released by the Mexican Ministry of Health (Secretaría de Salud, SS) through the Epidemiological Surveillance System for Viral Respiratory Diseases on COVID-19. Some of the confirmed cases of COVID-19 registered from February 21st to 18th February, 2021, were used in the present analysis. The database included all positive, negative, and suspected cases of COVID-19 registered by 475 Viral Respiratory Disease Monitoring Units (Unidades Monitoras de Enfermedad Respiratoria viral; USMER by its Spanish acronym) and by the medical units that attended the cases. The data can be found using the link [18]. For this research, the variables considered for every subject includes: sex, age, immunosuppression, pneumonia, and asthma. The survival time(in days) is taken as time to death, which is the time difference between the date on admission and the date of last contact. The survival time was right censored at 20 days. The Kaplan–Meier method was used to plot survival curves. These graphs served to test the proportional hazard assumption. The BTIGHLD as a parametric model was used to fit the data with the covariates under consideration. The result obtained was compared with that of the cox proportional regression model using the Akaike Information Criterion(AIC) and the Corrected AIC(CAIC). The significance level for the contributions of the covariate was set at . The data were analyzed with the statistical package software Statistical Packages for Social Sciences SPSS version 20.0 and R 4.0.4.

After analysis, Figure 1 shows the histogram of the survival times of the patients, with the fit of the BTIGHLD. It reveals that the data are rightly skewed which from [14] had shown that the BTIGHLD can give good fit.

Table 5 gives the descriptive statistics of the survival time.

The graph of the distribution of the age of the patients as at the time of report is in Figure 2. It shows that most patients were between 55 and 60 years.

Furthermore, Figure 3 reveals the KM plot of the survival data.

For variables with dichotomous responses, we plotted KM curves to compare their survival rates. For pneumonia, Figure 4 shows the probability of survival is higher in those who do not have pneumonia than those who have it.

Also, in Figure 5 shows that those that do not have asthma have a higher chance of survival than those who have asthma.

Figure 6 shows that those who have immunosuppression and those who do not have almost equal chance of survival.

Figure 7 shows that the probability of survival in male is higher than that of the female.

Analyzing the survival time using the BTIGHLD to assess the contribution of the prognostic factors to survival of the COVID-19 patients, we have the result in Table 6 in which it revealed that pneumonia, age, asthma, and immunosuppression are significant, with a gender insignificant at level 0.05.

In this study, it was discovered that about two third of respondents had pneumonia comorbidity followed by immunosuppression. This is not farfetched as the symptoms of COVID-19 are very similar to those of pneumonia. There could even be a transposition of symptoms, more patients presenting with pneumonia-like symptoms. This finding is similar to another study done by Hernandez–Vasquez where pneumonia is one of the presenting complaints of COVID-19 patients.

As also found in this study, asthma comorbidity with COVID-19 is minute, which is comparable to another study conducted in Saudi Arabia where only 1.7 of respondents had asthma. However, it was concluded in the study that patients with pneumonia-like symptoms could develop a life-threatening situation to COVID-19. Furthermore, pneumonia, asthma, and immunosuppression were found to statistically influence the survival times of patients, right from the day of admission till day of last contact. This is in tandem with another study done in Mexico by [19] where there was a statistically significant incremental gradient between COVID-19 and pneumonia as well as death among respondents with comorbidity like pneumonia and asthma. Also, immunosuppression was found out in the same study to increase the likelihood of dying from COVID-19. This is not farfetched as comorbidity as well as immunosuppression weakens the immune system of individuals, thereby making it difficult for immune responses to be mounted against diseases that should not have been able to overcome the patient. It has been postulated that most comorbidities with COVID-19 are associated with the ACE-2 receptor expression subsequently leading to a copious release of proprotein convertase that ultimately increases the entry of the virus into the cells as found by the author of [20].

7. Conclusion

In this work, some properties of the BTIGHLD have been further studied which include the moments, moment generating function, and order statistics. Furthermore, the BTIGHLD was studied under right-censored observations, both in presence and absence of covariates. Using the maximum likelihood estimation method, the properties of estimates of the model were discussed using simulation studies with various parameters under different censoring rates. It was revealed from the result that the estimates performed well due to reducing values of average estimates coupling with the mean square error as the sample sizes increase. We further applied this model to COVID-19 data [8, 18, 21, 22].

Appendix

The AFT model is of the following form:where is said to follow a particular distribution, are the estimates of the covariates , and is the intercept of the model. Let .

From the definition of survival function with covariates,

The above equation is the baseline survival function

Now,

From the relationship between probability distribution function and survival function,

From the relationship between hazard function, probability distribution function, and survival function,

Let and , thus,

Let

Then,

From the Wolfram statistics,where Therefore,

This gives

Data Availability

The data are available using this link: https://drive.google.com/file/d/1l5uEhRYUyvQqTYcy-_-f2mB2y7GRiS_R/view?usp=sharing.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This manuscript is supported by Digiteknologian TKI-ymparisto project A74338(ERDF, Regional Council of Pohjois-Savo)