Abstract

In the recent era, the introduction of a new family of distributions has gotten great attention due to the curbs of the classical univariate distributions. This study introduces a novel family of distributions called a new type 1 alpha power family of distributions. Based on the novel family, a special model called a new type 1 alpha power Weibull model is studied in depth. The new model has very interesting patterns and it is very flexible. Thus, it can model the real data with the failure rate patterns of increasing, decreasing, parabola-down, and bathtub. Its applicability is studied by applying it to the health sector data, and time-to-recovery of breast cancer patients, and its performance is compared to seven well-known models. Based on the model comparison, it is the best model to fit the health-related data with no exceptional features. Furthermore, the popular models for the data with exceptional features such as correlation, overdispersion, and zero-inflation in aggregate are explored with applications to epileptic seizer data. Sometimes, these features are beyond the probability distribution models. Hence, this study has implemented eight possible models separately to these data and they are compared based on the standard techniques. Accordingly, the zero-inflated Poisson-normal-gamma model which includes the random effects in the linear predictor to handle the three features simultaneously has shown its supremacy over the others and is the best model to fit the health-related data with these features.

1. Introduction

Statistical modeling and predicting real-life events are vital issues in the training and implementation of the health care and health sector in general [1]. Although the classical and modified statistical models have been applied to the data in health applications, they do not provide the best fit when the data show nonmonotonic failure rates. This clearly demands the generalized or extended versions of these classical models. Thus, it motivated many involved researchers to propose new flexible extensions of distributions by adding one or more additional parameters to the baseline distribution.

Correspondingly, our study acquaints with a more flexible family of distributions called a new alpha power type 1 family of distributions by introducing a new parameter to the exponential type of family of distributions. It is more suitable for skewed data with nonmonotonic failure rates and it shows novelties in the area of distribution theory.

Several new developments in the distribution theory are proposed in the literature. Roozegar and Nadarajah [2] intended to study the quadratic hazard rate power series distribution. The same authors Roozegar and Nadarajah [3] were also devoted to propose the power series skew normal class of distributions, by taking power series as a key issue. Very recently, Chesneau et al. [4] introduced a new extended family of distributions called an alternative to the Marshall–Olkin family of distributions. They used five different estimation methods to reveal the alternating capacity of the new family to the existing one and they used the subcases for the regression purpose. The other scholars, Elbatal et al. [5], also suggested a new class for the generalized distributions, which they named it the alpha power Weibull G (APW-G) family.

Shehata et al. [6] introduced a flexible family of distributions for the asymmetric left-skewed bimodal real-life data with special attention to the flexibility patterns of the probability density and hazard functions and they called it a novel two-parameter G family of distributions. They used a copula method to characterize the new family for the special model, the new extension of the exponential distribution.

Some researchers used the techniques such as transformation, extension, and compounding to introduce a new family of distributions. Chesneau et al. [7] proposed a new family of probability distributions, based on a cosine-sine transformation by compounding a baseline distribution with the cosine and sine functions. The other authors, Ahmad et al. [8], applied the same approach by adding a parameter to introduce a new class of probability distributions. They named the newly suggested model as the extended alpha power transformed family of distributions and the extended alpha power transformed Weibull distribution is studied as a special model. A few other scholars who followed this approach, among many, are Ahmad et al. [8]to propose a new exponentiated TX class of distributions. Hussein et al. [9] to introduce a new flexible modified alpha power (MAP) family of distributions by adding two parameters to the baseline model, and El-Sherpieny et al. [10] to suggest a new generating family of distributions.

In line with the introduction of the new family of probability distributions, the exploration of the models for some special features in the data is reasonably needed. Thus, repeatedly measured count outcomes are characterized by three special features. These are the dependence of the individual subject-specific due to the clustering effect or correlation, extravariability due to the counts (overdispersion), and a special case of overdispersion which is said to be zero-inflation [1115].

One of the motivating gaps this study raises is that Mekonnen et al. [16] used epilepsy data by allowing both correlation and zero-inflation in the data and they analyzed the data by using the Poisson (P), negative-binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative-binomial (ZINB) models. Their study has two defects: the first one is it could not raise and discuss the issue of overdispersion while the data are overdispersed and the second one is that the mentioned count models alone cannot fully handle the dependence in the data. It is not a straightforward and easy task to handle the three features in the same model. Due to this, many authors in the literature dealt with one or two of these features [12, 16]. Unlike other studies, this study is motivated to model those three features simultaneously.

Moreover, the repeatedly measured count data such as epileptic seizure are senseless to be analyzed by using the distribution models unless they are easy to be expressed in the exponential form. Hence, the need for an update on this gap is raised in this study reasonably and convincingly.

The rest part of the paper is organized as follows: The first part of this study introduces a new family of distributions called a new type 1 alpha power family of distributions to the health data. This part is presented in Sections (1–6) in detail. The second part discusses the models for correlation, overdispersion, and zero-inflation in the data (see Section 7). Thus, Section 2 introduces a new type 1 alpha power family of distributions, Section 3 discusses special cases, and Section 4 discusses the basic statistical properties of the new proposed family. Section 5 presents an estimation of the model parameters of the new subfamily of distributions. Section 6 illustrates the application of the new family to the new data set. Section 7 discusses the models for correlation, overdispersion, and zero-inflation, and lastly, Section 8 summarizes the points with concluding remarks.

2. A New Type 1 Alpha Power Family of Distributions

Alzaatreh et al. [17] introduced a popular T-X approach which updates the distributional flexibility of the existing models and it has become a well-known approach in the literature. They proposed the probability density function (PDF) of it as follows:where the function fulfills some specified conditions; for detail, see Alzaatreh et al. [17]. In equation (1), is a baseline cumulative distribution function (CDF) with parameter vector and is the PDF of the parent model with parameter vector . Corresponding to , the PDF is given by the following equation:

Recently, Ahmad et al. [18] applied the T-X approach and proposed an interesting member, which they call the weighted T-X Weibull distribution family (WT-XW). They introduced this family by using in equation (1) with , where is the PDF of the two-parameter Weibull model with the parameters . The CDF of the WT-XW family is given by the following equation:and the corresponding PDF is given by the following equation:

The PDF and CDF of exponential type are given by the following equations:andrespectively.

Note that, must satisfy the following conditions.(1) is a non-negative, differentiable, and increasing function of z.(2) 0 and .

The classical exponential types, Rayleigh, Weibull, and other extended lifetime distributions belong to the class defined in equation (6); see Liao et al. [19]. For further detailed information about similar statistical distributions, see Mehboob Zaidi et al. [20], Afify et al. [21], Reyad et al. [22], and Al-Babtain et al. [23].

Herein after, we introduced an additional parameter in equation (6), which replaces the exponent term to propose a very flexible family whose CDF and PDF are given by the following equations:andrespectively, where throughout the paper.

If , the CDF in equation (7) becomes similar to equation (6). The function fulfills the conditions which are given in 1 and 2. It is straightforward that . To make it more clear, letand

Therefore, based on the results in equations (9) and (10), it is observed that the function written in equation (7) is a proper CDF. And the expression given in equation (7) is very helpful and is useful to generate new statistical models belonging to the T-X family of distributions.

Next, we proposed a new family called a new type 1 alpha power (NT1AP) family by using in equation (7). The CDF and PDF of the NT1AP family are given by the following equations:andrespectively.

In addition, the corresponding survival function and the hazard function are given by the following equation:respectively.

3. A Special Subfamily

In this section, we discuss a special member of the NT1AP family called a new type 1 alpha power Weibull (NT1AP-W) model. It is introduced by inserting the CDF and PDF of the two-parameter Weibull into equations (11) and (12), necessarily. Thus, let a random variable Z have the Weibull distribution, then its CDF and PDF, respectively, are given by the following equation:and the resulting CDF and PDF of the NT1AP-W are given by the following equations:andrespectively, where .

Based on equations (15) and (16), the corresponding and are given by the following equations:respectively.

Following this, the graphical expression of the NT1AP-W model for different parameter values is displayed as follows.

Figure 1(a) displays the plot of for the NT1AP-W for different scenarios: (blue-line), (purple-line), (green-line), (red-line), and (black-line).

Figure 1(b) displays the plot of for the NT1AP-W for different scenarios: (purple-line), (cyan-line), (green-line), (red-line), and (black-line).

of the NT1AP-W is elicited in Figure 1(a) and it has attractive flexible patterns such as decreasing, parabola-down, left-skewed, right-skewed, and decreasing-increasing-decreasing-constant (polynomial type). Figure 1(b) illustrates for different parameter values to show the different patterns and how flexible the distribution is. It has the patterns such as increasing, bathtub, parabola-down, and decreasing.

4. Some Basic Statistical Properties of the New Type 1 Alpha Power Family of Distributions

In this section, we discuss the basic statistical properties of the NT1AP family of distributions.

4.1. Order Statistics

Order statistics are widely used in applied statistics such as reliability and lifetime and records. Suppose that is a random sample of size following the NT1AP family of distributions with parameters () and are its corresponding order statistics. Then, the density function of for is given by the following equation:

By substituting the CDF and PDF of the NT1AP (see equations (11) and (12)) into , we obtain the following equation:where .

The simplified form of the order statistics density function is given by the following equation:whereand is the product of the PDF and the CDF for the order statistics.

The PDF of the order statistics for the NT1AP can be obtained from equation (20) and its central moments and moments generating function are given in the next subsection.

4.2. Moments and Moment-Generating Functions

Based on the PDF of the NT1AP family of distributions given above, its central moments, , is obtained as follows:where

The moment generating function of the NT1AP can be obtained by using the last result of in as follows:

4.3. Mean Deviation and Bonferroni and Lorenz Curves

Let Z NT1AP ; the mean deviation about the mean and the median are defined by the following equations:respectively, where z ∈ , , and denotes the median.

These can further be expressed as follows:respectively, where is the first incomplete moment. These measures have been applied to a wide variety of fields, such as reliability, demography, insurance, and medicine [24].

Moreover, let Z NT1AP ; the Bonferroni and Lorenz curves are defined by the following equations:respectively, where and .

The next section deals with the maximum likelihood estimation for the model parameters of the NT1AP-W model.

5. Estimation of the Model Parameters of the NT1AP-W

The method of the maximum likelihood estimation for the model parameters for the NT1AP-W is discussed in this section.

5.1. Maximum Likelihood Estimation

This subsection deals with the computation of the maximum likelihood estimators (MLEs) for the model parameters of the NT1AP-W. Let be observations of a random sample drawn from the NT1AP-W with parameters , and . By using the PDF of the NT1AP-W (see equation (16)), the likelihood function is given by the following equation:and its corresponding log-likelihood function is given as follows:where . The model parameters are estimated by taking the first partial derivatives of the with respect to each model parameter and equating them to zero.

Thus, having the , the partial derivatives of it with respect to each parameter are given by the following equation:where is the score function for .

Subsequently, the MLEs of the parameters can be obtained by solving the following nonlinear equation:using numerical methods such as Newton–Raphson or Broyden’s methods.

Furthermore, the Fisher information matrix is given by the following equation:where , and

The total is given by , which can be approximated by the following equation:

More analytically,where

For an interval estimation of the model parameters, we need the observed information matrix , where . Under the regularity conditions, the multivariate normal distribution is used to construct approximate confidence intervals for the model parameters. Here, is the total observed information matrix evaluated at . Then, the 100(1-)% confidence intervals for , and are given by , , and , respectively, where the s denote the diagonal elements of the corresponding to the model parameters, and the is the quantile (1-) of the standard normal distribution.

6. An Application to Breast Cancer Data

Breast cancer is one of the most severe diseases in the world and has become the public’s everyday agenda in both developed and developing countries. The new data on the time-to-recovery of 686 breast cancer patients were taken from a patient’s medical record card that was enrolled from October 2012 to April 2017 in Nigist Elleni Mohamad memorial referral comprehensive hospital (NEMMRCH), Hossana, south Ethiopia; see Figure 2.

We illustrated the fitting capacity of the NT1AP-W model to the data by comparing it to the three-parameters exponential flexible Weibull extension (EFWE) of El-Desouky et al. [25], the three-parameters Poisson inverse Weibull (PIW) of Joshi and Kumar [26], the five-parameters exponentiated Weibull-Weibull (EWW) of Hassan and Elgarhy [27], the three-parameters Alpha Power Transformed Weibull (APTW) of Elbatal et al. [28], the five-parameters Kumaraswamy Weibull Poisson (KWP) of Marinho et al. [29], the four-parameters Kumaraswamy Weibull (KW) of Cordeiro et al. [30], and the three-parameters New Weighted Weibull (NWW) of Elsherpieny et al. [31].

The competing models with their corresponding CDFs are given by the following equation.(1)EFWE:(2)EWW:(3)APTW:(4)KWP:(5)KW:(6)NWW:(7)PIW:

The information criteria (IC) such as (i) AIC [32], (ii) CAIC [33], (iii) BIC [34], and (iv) HQIC [35] are used to discriminate the best model. In addition to these criteria, the log-likelihood of the fitted models is also calculated. In all these, the model with the least IC value is taken to be the best model to fit the data.

The MLE of the parameters with their corresponding standard errors and the model adequacy measures for the fitted models are given in Tables 1 and 2, respectively.

The MLEs and standard errors of the NT1AP-W model along with the seven competing models (EFWE, EWW, PIW, KWP, KW, NWW, and APTW) are displayed in Table 1. Table 2 gives the model comparison result (model adequacy measures) for all models considered in this section. The new proposed model NT1AP-W, based on the five criteria, is shown to be the best-performing model among the seven competing models. This shows the new proposed model outperforms the set of similar competing models.

The mathematical expressions of the and of the NT1AP-W are given by the following equation:respectively.

The quantile-quantile (Q-Q) plots of the estimated CDF and including the total time on test (TTT) plot, see e.g., Aarset [36], for the estimated CDF, are given in Figure 3.

From Figure 3, it is observed that there is no potential influential observation and the data is linear and normal enough. In addition, the TTT plot shows the model has an increasing shape in this data set.

Furthermore, the semivariogram for the exponential and Gaussian serial correlations for the estimated CDF and for the NT1AP-W distribution using an epileptic data set (another public health data set) is plotted as follows.

Figure 4 depicts a decreasing (in CDF) and a constant (in ) correlation patterns for the clustered data set. An epileptic seizure data set, which will be discussed below, owns some special features such as correlation, overdispersion, and zero-inflation. Hence, it needs further investigation using models discussed in Section 7.

7. Combined Models for Data with Overdispersion, Correlation, and Zero-Inflation

Many authors in the literature could not model the three special features such as correlation, overdispersion, and zero-inflation in the repeatedly measured count data simultaneously in the same model, for instance, see Mekonnen et al. [16]. Another possible reason for this section is that data such as epileptic seizures are meaningless to be analyzed by using the distribution models unless they are easily able to be expressed in the exponential form (see Section 7.1) to fit the regression models. Even so, some extra parameters to handle some special characteristics in the data should be able to be imposed on the distribution model. Thus, it is a good opportunity for this study to implement the appropriate models for the data with special characteristics, as discussed in the next subsections.

7.1. Combined Models for Correlation and Overdispersion

To handle those features mentioned in the above subsection simultaneously in the same model, we need to follow some procedures as follows. Thus, let be the outcome with observation for subject, where and which be with the group of measurements into a vector of . Conditionally upon q-dimensional random effects , as an assumption, the outcomes are independent with the density given by the following equation:wherewhere is a known link function, and are a p and q-dimensional vectors of covariates for fixed and random effects, respectively, is a p-dimensional vector of unknown regression coefficients for fixed effect, and is a scale parameter for overdispersion. Let further be the probability density function of the distributed random effects . Then, the Poisson-normal (PN) distribution model is given by the following equation:where

It is clear that equation (47) treats the feature dependence in the data (correlation) by imposing the normal random effect in the Poisson model. According to Molenberghs et al. [11] and Molenberghs et al. [13], adding another random effect-gamma gives the following equation:where is given in equation (46) and in which and are shape and scale parameters, respectively and . According to Molenberghs et al. [13]; the combined model which incorporates the normal and gamma random effects, and , respectively, is expressed in the form:

The expression for its expectation is given by , where and and are the mean and the variance of the , respectively. The likelihood contribution of the combined model for subject is given by the following equation:and its total likelihood is further given by the following equation:

Thus, equation (49) shows the addition of gamma random effect in the PN-mixture model to handle the overdispersion feature and to form the combined model called Poisson-normal-gamma (PNG), which further handles both correlation and overdispersion simultaneously.

7.2. Combined Models for Zero-Inflation

The next concern is zero-inflation or excess zeroes which may be beyond the P model. It is assumed that there are two processes of zeroes in zero-inflated count models. The first one is where zeroes may arise from point mass (process 1) and as the second one, they may come from the conventional count component (process 2). Now assume that for measurement , process 1 takes the probability of and for process 2, due to Kassahun et al. [37]. We note that process 1 generates only zeroes, whereas process 2 with a designation generates counts from a P model, an NB model, a PN model, a generalized linear mixed model (GLMM), or a PNG combined model [37, 38]. The general form of the zero-inflated PN model in mixture is given by the following equation:

Thus,

The Bernoulli model is employed to represent the presence of additional zeros or a zero-inflated element . This is done for the simplest scenario that includes just an intercept, along with the incorporated predictors and for both fixed and random effects. In addition, the model involves a vector of coefficient dedicated to estimate excess zero occurrences, as well as the inclusion of random effects . For this case, the common link functions for the count and or binary outcomes such as the logit or probit can be used. Note that , , and in equation (46) are now replaced by , , and , respectively, for the nonzero count part. The predictors in the count and excess-zero component can either be overlapping, a subset of the predictors can be used for excess-zeroes, or different predictors for the two parts that can be used.

By taking the assumption that the random effects are normally distributed and possibly correlated with the correlation parameter , the variance-covariance matrix is given by the following equation:

The conditional mean and variance of the zero-inflated PNG (ZIPNG) are given as follows:respectively.

Generally, the theoretical discussion of this section helps us to understand how the count outcomes (P model) with some special features, mentioned above, are dealt with by integrating the combined models over the respective random effects to describe the three features simultaneously.

7.3. Estimation of the Model Parameters for the Combined Models

The full likelihood for PNG is given in Section 7.1. Corresponding to this contribution, the estimation is performed by integrating over the respective random effects, accumulating the marginal likelihood, and maximizing it analytically, which can also be seen from Molenberghs et al. [11], Molenberghs et al. [13], and Kassahun et al. [37]. Based on Section 7.1, the partially marginalized PNG model is expressed as follows:

A similar expression for ZIPNG is given by the following equation:

The numerical estimation is performed by using a flexible normal random effects tool, the SAS procedure NLMIXED.

7.4. An Application to Epilepsy Data

Epilepsy is a noncommunicable neurological and human brain disorder. It is treated medically in a psychiatric clinic in a hospital. The data on this issue is collected from Felege Hiwot Referral Hospital (FHRH) which is located in Bahir Dar city, the capital city of Amhara regional state, Ethiopia; see Figure 5. It is 565 km far from the capital city of Ethiopia, Addis Ababa. In this clinical trial study, fifty-three subjects were followed for seven months (not equally for all) and the number of epileptic seizures was collected on a week basis. Furthermore, data on socio-economic, behavioral, and demographic information are collected to determine associated risk factors. Mekonnen et al. [16] used similar data for their analysis with all factors. However, in our study, we only focused on time-related factors for the modeling easiness.

7.4.1. Description of the Data

The numerical and graphical visualizations of the three features are presented here.

The numerical presentation in Table 3 shows 276 (30.2%) of the measurements among the total number of measurements were zero or the number of times that the patients have not shown seizures in their follow-up. This indicates that there are excess zeros in the data. And it is seen that the observed standard deviation (st. deviation, 5.5) is greater than the observed mean (3.1), which further shows the presence of overdispersion in the data. The graphical illustration via histogram in Figure 6 also shows that most of the observations are right skewed to zero and it indicates that most of the observations are oriented to the left of the average of the data.

In Figure 6, the first picture supports Table 3 (illustration of zero-inflation) and the second one shows the individual profile plot or presence of correlation in the data.

7.4.2. Test Result for Overdispersion and Zero-Inflation

The presence of overdispersion was tested by using the functionof the R package AER and it is distributed asymptotically normal with mean zero and variance 1. The input “Epilepsy1” of the function represents the response variable andrepresents the linear transformation. The test statistic with the value and the alphaconfirms that the data are overdispersed due to the known feature in the data. This feature as shown next is due to the excess structural zeros in the data. Thus, since the value is much smaller than the z value and the alpha is greater than 1, overdispersion in the data is confirmed.

The test for excess zeros with the test statistic distributed as Chi-square with 1 dfshows that there is zero-inflation in the data. The presence of correlation is first detected from the longitudinal nature of the data in which the individuals are exposed for the repetitive measurements and second it is easily visualized from the individual profile plot in Figure 6, the second picture.

For a further analytical description of the data, let stand for the number of epileptic seizures for the patient for the week of the follow-up period and let be the corresponding time-period for the occurrence of the . For the characterization of the data based on the zero-inflation process, let us assume that the PN generates the counts with the mean which is given by equation (48) as follows:when the counting process is generated from the PNG with the mean , it takes the following form:

The probability of zero-inflation is given by , see Subsection 7.2 for the analytical expression.

7.5. Comparison of Models for Overdispersion, Correlation, and Zero-Inflation

In this subsection, the known features (separately and jointly) in the repeatedly measured count data, correlation, overdispersion, and zero-inflation are discussed by using the eight models. More importantly, the ZIP, ZIPN, ZIPG, and ZIPNG models are compared by using the epileptic seizure data, and also their immediate counterparts P, PN, PG, and PNG models are used for the respective comparison. In the zero-inflation version as compared to the nonzero version models, there is a substantial improvement based on the -2logL, AIC, and BIC comparison techniques.

The ZIPG model has shown an improvement relative to the ZIP. The ZIPN is an improvement when it is compared to the ZIP and ZIPG. And the ZIPNG has further shown the best improvement relative to the ZIP, ZIPG, and ZIPN, which shows that it is the best-fitting model for the data. In the absence or when the extrazeroes in the data are ignored, the improvement in the model goes from the P to the PNG, and the PNG is the best-fitting model for the nonzero-inflated data. Generally, the trend of improvement in the models goes from the P to the ZIPNG.

The standard deviation of the random effect (St. dev. RE, represented by in Table 4) is significant in the models PN and PNG. This shows the normal random effect is important to add in the P model to handle the correlation in the data and it is more pronounced in the PN and PNG than in other models. The NB parameter and the inverse NB parameter are significant in the models PG, ZIPG, PNG, and ZIPNG. This implies that it is customary to include the gamma random effect in the P model to deal with the overdispersion in the data and these models are capable of handling overdispersion in the data. The zero-inflation parameter , where , is significant in the models ZIP, ZIPN, and ZIPNG, which indicates the excess zeroes in the data should not be ignored in the analysis and these models are appropriate for its analysis.

The feature correlation in the repeatedly measured data is common and needs to be handled correctly during the data analysis. As it is displayed in Table 4, it is treated by adding the normal random effect in the P model. The overdispersion is another unavoidable feature and it is dealt with by including the gamma-distributed random effect in the log-linear predictor forming the combined model (PNG) by aggregating the other induced features. Furthermore, the zero-inflation is modeled simultaneously by considering the two processes in the PNG model.

8. Discussion and Conclusion

In this study, the NT1AP family of distributions was introduced. Based on the new family, the new NT1AP-W model was discussed in detail. The need for introducing the new model is due to the reason that the classical and modified statistical models, which have been applied to health applications, do not provide the best fit when the data show nonmonotonic failure rates. The mathematical logic behind the new method for deriving new distributions is to introduce an extraparameter which gives extraflexibility to the new family and to introduce the new family which is capable of handling different patterns in the data. A new data set (breast cancer) is considered and the proposed model is compared to the recent models.

The NT1AP-W, EFWE, EWW, PIW, KWP, KW, NWW, and APTW models were applied to the abovementioned public health data. The NT1AP-W model has shown its supremacy based on the five adequacy measures. The newly proposed family has several advantages such as (i) the addition of an extra parameter gives great flexibility, (ii) the added extra parameter makes the approach simple to modify the existing distributions, (iii) it is useful to introduce new distributions in the domain of T-X family, and (iv) it is useful to extend the existing distributions with a closed CDF. Based on the findings, the NT1AP-W model is an appropriate model for dealing with the data in health science and other related sciences.

Mekonnen et al. [16] used epilepsy data by allowing both correlation and zero-inflation in the data and they analyzed the data by using the P, NB, ZIP, and ZINB models. The second part of this study is motivated by two gaps: the first one is these authors could not raise and discuss the issue of overdispersion while the data are overdispersed and the second one is the mentioned count models which alone cannot fully handle the dependence in the data. Hence, the need for an update on this gap is raised reasonably. It is also misused to analyze epileptic seizure data by using linear mixed models due to the special features present in the data.

Except Molenberghs et al. [11], Kassahun et al. [37], and Molenberghs et al. [13], none of the scholars tried to discuss the issue of the three special features in the data in the aggregate. For instance, Mekonnen et al. [16] discussed only the two features (correlation with less focus and zero-inflation). And the other scholars Hinde and Demétrio [39], Workie and Lakew [40], Dare et al. [41], and Adesina et al. [42] discussed only the count models for the nonclustered data with a lack of attention to the aggregate features. Contrary to these studies, this study has implemented and explored the appropriate models for the data with the three features simultaneously.

The slope ratio is significant in the three nonzero version models P, PN, and PNG, while it is insignificant in the zero version models except in the ZIPNG model. Other studies could show this relationship except the result of Kassahun et al. [37]. Like in the result of Kassahun et al. [37]; the correlation of the random effects (correlation RE, ) is unrealistically significant in none of the models. This result contradicts the longitudinal nature of the data that the response variable (number of epileptic seizures) is recorded repeatedly over time and this needs further investigation.

Based on the analysis results, it is noted that the special features such as correlation, overdispersion, and zero-inflation cannot be ignored from the repeatedly measured count data and need to be modeled simultaneously by using the appropriate combined models.

Appendix

Sample Data

The sample of two real datasets (breast cancer patients’ data and epileptic seizure patients’ data) is displayed under these subsections, respectively.

Data Availability

Due to third-party usage, some ethical issues, and subjects being human beings, the data can obtained from the corresponding author upon request.

Ethical Approval

Epileptic seizure data are obtained from Mekonnen et al. [16] formally, where the ethical issues are kept standard. Breast cancer data are collected by the researchers by keeping all ethical standard issues including obtaining ethical permission letters and consent of the subjects.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

Getachew Tekle, Rasool Roozegar, and Zubair Ahmed conceptualized the study. Getachew Tekle and Zubair Ahmed gathered the data. Getachew Tekle, Rasool Roozegar, and Zubair Ahmed designed the methodology. Getachew Tekle and Rasool Roozegar brought the resources. Rasool Roozegar and Zubair Ahmed supervised the study. Getachew Tekle and Rasool Roozegar analyzed the data. Getachew Tekle and Rasool Roozegar wrote the original draft.

Acknowledgments

This study was supported by the Yazd University, Iran.