Abstract

Poisson regression is a popular tool for modeling count data and is applied in medical sciences, engineering and others. Real data, however, are often over or underdispersed, and we cannot apply the Poisson regression. To overcome this issue, we consider a regression model based on the Conway–Maxwell Poisson (COMP) distribution. Generally, the maximum likelihood estimator is used for the estimation of unknown parameters of the COMP regression model. However, in the existence of multicollinearity, the estimates become unstable due to its high variance and standard error. To solve the issue, a new COMP Liu estimator is proposed for the COMP regression model with over-, equi-, and underdispersion. To assess the performance, we conduct a Monte Carlo simulation where mean squared error is considered as an evaluation criterion. Findings of simulation study show that the performance of our new estimator is considerably better as compared to others. Finally, an application is consider to assess the superiority of the proposed COMP Liu estimator. The simulation and application findings clearly demonstrated that the proposed estimator is superior to the maximum likelihood estimator.

1. Introduction

Regression models are the most popular tool for modeling the relationship between a response variable and a set of explanatory variables. In many real-life problems, the response variable is in the form of counts i.e., takes on nonnegative integer values. For count data, the most widely used regression model is the Poisson regression [1, 2]. One of the major features of the Poisson distribution is that the mean and variance of the random variable are equal. However, data often exhibit over or under dispersion. In such circumstances, the Poisson distribution often does not provide good approximations. For overdispersed data, the Negative Binomial model is a popular choice [3]. Other overdispersion models include Poisson mixtures [4]. However, these models are not good for underdispersion. A flexible alternative that captures both over and underdispersion is the Conway–Maxwell Poisson (COMP) distribution, which was introduced by Conway and Maxwell in 1962 for modeling queuing systems with state-dependent service rates. The COMP distribution is a two-parameter generalization of the Poisson distribution, which also includes the Bernoulli and geometric distribution as special cases [5]. Shmueli et al. [5] established the statistical properties and parameter estimation methods of the COMP distribution. The COMP distribution has several applications for count data modeling [510].

Generally, the maximum likelihood estimation (MLE) method is a commonly used estimation method in order to estimate the unknown parameters of the COMP regression model (COMPRM). However, it is well known that the MLE is very sensitive to ill-conditioned data. Since poor estimates are produced in the existence of high but imperfect multicollinearity [11]. The major drawbacks of multicollinearity are the variance and standard error becomes high [1214]. Further t and F ratios are statistically insignificant.

To reduce the effect of multicollinearity, different biased estimators are available in the literature. Among these, the most common and familiar estimation method is the Liu estimator initially introduced by Keijan [15]. For the linear regression model (LRM), we recommend the readers to see [1620]. However, the literature on generalized linear model (GLMs) is limited. For detailed description, we refer, Månsson et al. [21] proposed the Liu estimator for the logit model. Månsson et al. [22] introduced some biasing parameters for the Poisson Liu estimator, Månsson [23] considered some shrinkage parameters for the negative binomial regression model. Qasim et al. [24] considered some biasing parameters for the gamma Liu regression model and Wu et al. [25] introduced the restricted almost unbiased Liu estimator for the logistic model. Amin et al. [2] studied the performance of some ridge parameter estimators in the bell regression model. Khan et al. [38] studied the influence diagnostic methods in the Poisson regression model with the Liu Estimator. Majid et al. [26] proposed some Liu parameter estimators for the bell regression model. Recently, Sami et al. [27] suggested the best ridge parameter estimator for the COMPRM. The present literature indicates that no such study related to the Liu estimator for the COMPRM is available. Therefore, we propose a Liu estimator for the COMPRM to minimize the effect of collinearity among the explanatory variables. The main aim of this study is to propose a Liu estimator for the COMPRM with some new Liu parameters. For assessing the performance of these new Liu parameters, we conduct a Monte Carlo simulation study under different evaluated scenarios.

The rest of the article is organized as follows: we present the statistical methodology of COMPRM in Section 2. However, a simulation layout and the results of Monte Carlo simulation are addressed in Section 3. A real-life dataset is presented in Section 4. The article ends with some concluding remarks.

2. Preliminaries: The COMPRM and Estimation Methods

Consider the response variable (y) comes from a with density function defined bywherewhere indicates the location parameter which is the mean function of the response variable and indicates dispersion parameter. Z is the normalizing constant. There are different indications of the dispersion parameter for different parametric values, such as if then data will be underdispersed, if then data will be overdispersed and if then data will be equally dispersed. The COMP distribution is also have relations with other distributions under different parametric conditions. For example if and , then COMP distribution will becomes geometric distribution, if , then COMP distribution will becomes Bernoulli distribution, and if , then COMP distribution will becomes the Poisson distribution. Since the COMP distribution does not have closed mathematical expressions to find its parameter so, it can be approximated by different.

The asymptotic mean and variance of Y for (2) are respectively given asand

Shmueli et al. [5] suggested that these approximations may not provide accurate findings when or Regardless its flexibility and attractiveness, the COMP has restrictions in its usefulness as a basis for a Generalized Linear Model (GLM), as shown in [28, 29]. In particular, neither nor provide a clear centering parameter. Whereas when , it differs substantially from the mean for small . Given that would be expected to be small for over-dispersed data, this would make a COMP model based on the original COMP formulation difficult to interpret and use for over-dispersed data [29]. So, Guikema and Goffelt [28] proposed a reparameterization using a new parameter i.e. to provide a clear centering parameter. The PMF with new formulation is defined aswhere

By inserting in (3) and (4), the mean and variance of Y is given in terms of reparameterization as and especially accurate when and . Now indicates the centering parameter and in new parameterization is used as a shape parameter. i.e. if , the variance is greater than the mean indicating the overdispersion, however indicates underdispersion. Based on the new formulation, it is worthy to establish the GLM and by using link function and it is easier to interpret the results of coefficients [28, 29]. The log-likelihood of (5) is

Let is the linear predictor under log link, then the log-likelihood function of (7) is given by

For the estimation of unknown parameters using MLE method, we use an iterative procedure. For considering the unconstrained optimization, let . Then, (8) becomes

For finding the unknown parameters by using MLE method, we first differentiate (9) with respect to and , respectively, as,and

The estimation of , it is required to fix . For detailed description about the information matrix, we recommend the study of Sellers and Shmueli [30]. Since COMPRM is used for modeling, the mean and variance depends on separate covariates and are, respectively, defined by [29]

For ease, we consider a single value of . After the final iteration, the estimate of MLE becomeswhere represents the adjusted response variable while , where . Equation (13) is a feasible estimator for estimating the unknown coefficients. Fisher scoring iterative method is generally used in order to evaluate both and . As the MLE has several adverse effects under multicollinearity one is that it produces larger variances. To overcome this issue, we propose the Liu estimator for the COMPRM which is called COMP Liu estimator (COMPLE) defined bywhere is the Liu parameter. If , then , and in case if , then .

2.1. The MSE Properties

The MMSE of an estimator of parameter can be defined aswhere is the covariance matrix of an estimator and represents the bias vector. The scalar MSE of the estimator is obtained by applying trace which is defined as

For the comparison of two estimators and , the estimator is superior to if and only if

In terms of scalar MSE, the function is true if and only if

The covariance of iswhere is the dispersion parameter which is computed iteratively using (11). The MSE’s of the estimators are obtained by considering and , where is the orthogonal matrix composed of the eigenvalues of . While is the jth element of . The MMSE of the is given aswhereas the scalar MSE of the is

The bias, covariance, and MMSE of COMPL estimator can be, respectively, computed from (14) aswhere and . The scalar MSE of the COMPLE is defined bywhere is the jth element of . Since Keijan [15] showed that the Liu estimator gives a better performance than the ordinary least squares estimator, we are extending Liu estimator for the COMPRM called the COMPLE. For this purpose, we follow Keijan [15] and differentiate (23) with respect to d, we have

Putting d = 1, we have

Hence there exists such that or equivalently, .

Hence there exists such that or equivalently, .

2.2. Theoretical Comparison Based on MMSE and Scalar MSE

Lemma 1. Let M be a positive definite (pd) matrix, be vector of nonzero constants and c be a positive constant. Then if and only if [31].

Theorem 1. Under the COMPRM, consider , and . Then if and only if .

Proof. The difference among the MMSE functions of MLE and COMPLE is obtained byHowever, for scalar MSE the last expression is written asAfter simplification, (27) can be written asThe MMSE is p.d. if which is further equivalent to . Thus, if , then the proof is ended by Lemma 1.

2.3. Selection of the Shrinkage Parameter

The COMPLE is a better estimation method to deal with multicollinear regressors than the OLS. For the selection of optimal value of d, we follow the work of Månsson et al. [22], and differentiate (23) with respect to d and equating to zero, we have

The range of d depends on . Based on the theoretical work of [21, 24, 32], we define the following optimal value of d which is defined as

Furthermore, Qasim et al. [33] proposed some biasing parameters which we are also considered to assess the performance i.e. and . From this, following are our proposed estimators

3. Monte Carlo Simulation Study

This section presents a brief discussion about the generation of data with different factors that plays a crucial role in the construction of a simulation experiment. In addition, the assessment criteria are presented to examine the performance of the COMPLE with the traditional MLE.

3.1. Simulation Layout

The response variable of the COMPRM is generated from a distribution, where

Following [32], the correlated regressors are generated aswhere are the independent standard normal pseudorandom numbers, and is the correlation between the explanatory variables. In this study, to examine the effect of different degrees of collinearity on the estimators, the following different values are considered:  = 0.80, 0.90, 0.95, and 0.99. The slope parameters are decided such that , which is a commonly used restriction in the field, for further details see [32]. Further, four different values of sample sizes are considered to be 50, 100, 150, and 200. The number of regressors to be included in this study are 3, 6, 9, and 12. We consider three dispersion levels i.e. for over dispersion , for equi dispersion and for under dispersion, we consider to clearly monitor the performance of the proposed estimator. For the different combinations of the various values for , the data is generated 2000 times [32]. The MSE criteria is used for the evaluation of proposed and other considered estimators which is defined aswhere is the difference between the estimated and true parameter vectors of the proposed estimators at ith replication and R represents the number replications.

3.2. Results and Discussion

The estimated MSE’s of the COMPLE with the proposed Liu parameters are shown in Table 112. Various conditions are considered to judge the efficacy of COMPLE. The general comments on the simulation findings are discussed as follows:(1)From the provided evidences, we revealed that the overall performance of the proposed COMPLE under different shrinkage estimators is better as compared to the MLE. It can be seen that MLE is the most severely affected estimator due to its larger MSE in the presence of multicollinearity problem.(2)By fixing , and , the degree of multicollinearity has a direct impact on the estimated MSE’s of the COMPRM. Furthermore, by increasing the level of multicollinearity from moderate to severe, i.e., 0.80 to 0.99 by fixing all other factors, we noticed that the estimated MSE’s increases gradually. However, this increment is quite lower in our proposed estimator as compared to the traditional MLE. As various shrinkage parameters for the COMPLE are considered to assess the superiority under different controlled conditions, one can notice that the performance of the COMPLE under all shrinkage parameters shows consistent behaviour against multicollinearity. More specifically, the shrinkage parameter under COMPLE, i.e., are found to be better than other shrinkage parameters.(3)Results also revealed that when we increase the sample size, the estimated MSE values of all the estimator decreases. However, for all the choices of n, again COMPLE is the better and robust option as compared to the MLE.(4)Increase in the number of explanatory variables makes an increase in the simulated MSE values of the COMPRM estimators. Again, the MLE is considered to be the most negatively affected estimator in this situation. If we evaluate the performance of the estimators with regard to explanatory variables, then we saw from the findings of the simulation that the proposed COMPLE is a better choice since the proposed estimator shows consistent behaviour in contrast to the MLE. It can be noticed that as the number of regressors increases, the shrinkage parameter of the COMPLE, i.e., performed considerably better in contrast to other Liu parameters.(5)Dispersion factor also plays a pivotal role in the performance of any estimator. Since we consider different values of dispersion parameter. By increasing the dispersion, there is a gradual increase in the estimates MSE’s of all the considered estimators. Further, it is observed that for all the considered scenarios, performs consistently better as compared to the other shrinkage parameters.

4. An Illustrative Example

In this section, the implementation of the proposed strategy is illustrated by a study applied to a medium-sized timber industry which manufactures laminated plastic plywood. The study consisted in evaluating the effect of explanatory variables over the number of defects found in manufactured plywood. This dataset includes observations. We are considering the number of defects per laminated plastic plywood area while four explanatory variables i.e. is the volumetric shrinkage, shows the assembly time, represents the wood density, and describes the drying temperature. We have data about the number of imperfections accompanied by the input data of the four process variables as described above. To assess the dispersion of the response variable, we use the index of dispersion (D) which is computed as [34]. The estimated value of D of the consider application found to be 135.64. As D is greater than one, which shows that, the response variable has over dispersion. Moreover, we also compute the dispersion parameter that is obtain by using (11) iteratively. Using the COMPoissonReg R package, we found that  = 0.9614 which clearly demonstrates that there is over dispersion in the data set.

For assessing the multicollinearity among the considered dataset, we use condition index which is to 8634.73 > 30 which clearly indicates the presence of severe multicollinearity issue among the explanatory variables.

The estimated coefficients, standard errors, and the values of MSE criterion are reported in Table 13. The estimated coefficients of the MLE and COMPLE under different shrinkage parameters are respectively obtained using (13) and (14). Whereas the scalar MSEs of the estimators are respectively computed using (21) and (23). On the contrary, the value of shrinkage parameters are , , , , , , , , , , , and .

Table 13 clearly demonstrates that the MLE is affected negatively due to its inflated MSE when compared with the COMPLE under different shrinkage parameters considered. Furthermore, the standard errors of the MLE are quite larger than the COMPLE with all Liu parameters. It is of course clear that the performance of all the shrinkage parameters of the COMPLE is better as compared to the MLE. However, more specifically, the performance of the proposed COMPLE under

shows a much more robust behaviour due to its smaller values of SEs as well as the estimated MSE’s. The SEs are computed by taking the square root of the diagonal elements of the variances of the estimators. The application results are also hold Theorem 1 because for all j = 1,2,3,4.

We use other criteria, i.e., cross validation (CV) applied to the real-life data set for the assessment of the proposed method. The findings of average validation error with reference to CV method are shown in Table 13. For detailed description please see [3537]. Since the CV is considered to examine the predictive performance of the estimators comprehensively. Results signify that the performance of the proposed COMPLE with all shrinkage Liu parameters is better as compared to the MLE. However, attains a minimum CV value as compared to other Liu parameters of COMPRM. So, both criteria, i.e., MSE and CV shows that the proposed estimator performs consistently better as compared to the competitors. Hence, the findings of real application are also compatible with the results of Monte Carlo simulations.

5. Concluding Remarks

This article proposed the Liu estimator under different shrinkage parameters for the COMPRM to deal with multicollinearity, under and over dispersion. The comparison of the MLE and COMPLE is also made via a Monte Carlo simulation and a real-life example. For the purpose of assessment, MSE is used as an evaluation criterion. Based on the findings of simulation study and real-life example, we see that the performance of our proposed estimator is comparatively better as compared to the MLE for under and over dispersion. However, more specifically, performs better in contrast to other COMPLE parameters and MLE. So, we suggest to use COMPLE with shrinkage parameter to estimate the COMPRM with multicollinearity as well as under and over dispersion.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.