Abstract

The Gamma ridge regression estimator (GRRE) is commonly used to solve the problem of multicollinearity, when the response variable follows the gamma distribution. Estimation of the ridge parameter estimator is an important issue in the GRRE as well as for other models. Numerous ridge parameter estimators are proposed for the linear and other regression models. So, in this study, we generalized these estimators for the Gamma ridge regression model. A Monte Carlo simulation study and two real-life applications are carried out to evaluate the performance of the proposed ridge regression estimators and then compared with the maximum likelihood method and some existing ridge regression estimators. Based on the simulation study and real-life applications results, we suggest some better choices of the ridge regression estimators for practitioners by applying the Gamma regression model with correlated explanatory variables.

1. Introduction

In dealing with Gamma regression model (GRM), it is assumed that the explanatory variables are not correlated. However, there may be strong or near to a strong linear relationship that can be found among the explanatory variables which lead to the problem of multicollinearity. This problem of multicollinearity was firstly defined by Frisch [1]. Numerous biased and almost unbiased estimation methods have been proposed in the literature to solve the issue of multicollinearity for the generalized linear models. In the presence of multicollinearity, the maximum likelihood estimator (MLE) becomes unstable and it gives high variances of the estimated parameters. Therefore, confidence intervals of the GRM coefficients become wider and increases the probability of conducting a Type II error in any hypothesis testing about the estimated parameters.

There are many ways to solve the problem of multicollinearity. The most popular method is the ridge regression (RR), which was first suggested by Hoerl and Kennard [2, 3]. They showed that the RR method has a smaller mean squared error (MSE) than the ordinary least square due to the ridge parameter estimator (k). Different methods for estimating the ridge parameter have been proposed by different researchers (e.g., References [213] and recently by Amin et al. [14]). However, the literature on the biased estimation methods under Gamma ridge regression (GRR) is very limited. Amin et al. [14] proposed the GRR and recommended some estimators for estimating a ridge parameter based on both the analytical and Monte Carlo simulation studies.

As the different models have different best ridge estimators to deal the issue of multicollinearity, so, the main aim of this study is to adapt several methods for estimating the ridge parameter k in the GRR model. These estimators are not considered in the literature for the GRR model. The performance of these estimators is compared with the MLE and some existing estimators available in the work of Amin et al. [14]. The MSE is considered as a performance criterion to judge the performance of the considered parameter estimators with the help of Monte Carlo simulation and two real-life applications.

The rest of the article is organized as follows. In Section 2, we will discuss the GRM, GRR, and MSE properties of the GRR estimator and the mathematical formulation of the adapted ridge parameter estimators. In Section 3, we will illustrate the performance of the proposed methods using a simulation study under different factors. In Section 4, we will illustrate the performance of these estimators with the help of two real-life applications. In Section 4, we will give the conclusion of the study.

2. Statistical Methodology

2.1. The Gamma Regression Model

Consider the response variable y which follows the Gamma distribution, then the probability density function (PDF) for Gamma distribution is given bywhere is a positively skewed continuous dependent variable that follows the gamma distribution with parameters specified as . The mean and variance of equation (1) are, respectively, given by

Hardin and Hilbe [15] retransformed equation (1) and assumes that if and , then equation (1) may be written aswhere is a dispersion parameter. As the Gamma distribution is a specific form of the exponential family of distributions, the PDF of the exponential family of distribution is given bywhere is a location parameter and is the cumulant function. In exponential form, equation (4) can be written as

From equations (4) and (5), we have the following results:

Now, the mean and variance of GRM are, respectively,

Now, the mean function of the explained variable y that follows the gamma model with reciprocal link function is bywhere is the link function for the Gamma distribution, be the row of in which X is a covariate matrix of order , and be the column vector of regression coefficients of order .

The log-likelihood function of Equation (4) is given aswhere and .

The MLE is computed by the Iterative Reweighted Least Square (IRLS) aswhere is the adjusted explained variable and is used as an inverse link function .

The covariance matrix of is as follows:

Now, the estimated MSE (EMSE) of the MLE is computed aswhere represents the eigenvalue of the matrix. There is a disadvantage of the MLE that it increases the EMSE when there exists a correlation among the explanatory variables. The reason is that some of the eigenvalues will be small as a result of EMSE increase. It is evident from Equation (12) that the EMSE of the MLE increased as a result of the increase of multicollinearity (for details, see [14]).

2.2. The Gamma Ridge Regression

In the LRM, it is a basic problem that there exist correlations between the explanatory variables. This issue is referred to as multicollinearity. To solve this issue, the idea of ridge regression is given in [2, 3], which is a biased estimation method. To estimate the regression coefficients with the issue of multicollinearity, the most popular method is ridge estimation [13]. The main advantage and disadvantage of ridge regression are reduce EMSE and increase bias due to shrinkage parameter k, respectively.

The EMSE of the MLE becomes inflated and the results are misleading in the presence of multicollinearity [16]. To solve this problem, Segerstedt [16] gave an idea of ridge regression for the GLM using the method of [3]. Based on the work of Segerstedt [16], Amin et al. [14] estimated the GRR by using following form:where is defined in Equation (10), I is an identity matrix of order , and k is a biasing parameter.

For the GRR, the estimated mean function with inverse link function is given by

The EMSE of the GRR estimator is computed aswhere is defined as the element of and , where and is the eigenvector. Furthermore, the properties of the MSE are derived by Amin et al. [14].

2.3. The Ridge Parameter Estimators

Several authors introduced different ridge parameter estimators for various statistical models. Of these ridge parameters, Amin et al. [14] studied the performance of sixteen ridge parameter estimators in the GRR. Initially, Hoerl et al. [4] proposed the shrinkage parameter estimator for the ridge regression in the LRM and we are adapting this estimator for the GRR aswhere is the estimated dispersion parameter, is the number of unknown parameters, is the jth value of , and is the estimated mean function. Lawless and Wang [6] proposed a ridge parameter estimator for the LRM and we are adapting this estimator for the GRR as

Hocking et al. [17] suggested a ridge parameter estimator for the LRM and we are adapting this estimator for the GRR as

Gibbons [8] proposed a ridge parameter estimator for the LRM and we are adapting this estimator for the GRR as

Alkhamisi [10] proposed another ridge parameter as

Muniz et al. [18] proposed two ridge parameters as given by

Alkhamisi [10] proposed some ridge parameter estimators for the LRM and we are adapting these for the GRR aswhere .

Alkhamisi and Shukur [11] suggested some ridge parameter estimators for the LRM and these are adapted for the GRR aswhere .where .

Dorugade [19] proposed a ridge parameter estimator for the RR in the LRM and it is adapted for the GRR as

Muniz and Kibria [12] proposed the following shrinkage parameter estimators for the LRM and these are adapted for the GRR as

Muniz et al. [18] proposed the following five ridge parameter estimators for the RR in the LRM and these are adapted for the GRR aswhere .

Khalaf [20] modified the RR estimator based on as a ridge parameter estimator aswhere and are the largest and the smallest eigenvalues of and .

Asar and Genc [21] proposed the following ridge parameter estimators for the LRM and we are adapting these estimators for the GRR aswhere .

The rest of the ridge parameters are selected from Amin et al. [14] in which they concluded that the following three ridge parameter estimators perform well compared to others:

So, in this study, we are adapting and comparing the performance of the above-stated ridge parameter estimators for the GRR to find out the best ridge parameter estimator which attains smaller EMSE. The performance of these ridge parameter estimators will be evaluated with the help of Monte Carlo simulation study and two real datasets in the upcoming sections.

3. The Monte Carlo Simulations

The fundamental goal of this study is to compare the performance of ridge parameter estimators for the GRR. The performance of ridge parameter estimators may change due to various factors. These factors include sample size (), level of multicollinearity (), dispersion parameter , and a different set of explanatory variables. To examine whether the ridge parameters are better than MLEs, we use EMSE as a performance valuation criterion. The mathematical relation for the computation of EMSE is given as

In this simulation process, R represents the total number of replications and is the estimated value of in the replication from the MLE and GRR.

The response variable of the GRM is generated from the Gamma distribution by following the method in [14], with the mean vector as given by

The number of explanatory variables used for generating the GRM (inverse link function) is assumed to be 2, 4, and 8. The most common restriction in the study of Monte Carlo simulation is that the regression parameters are selected so that [4, 5, 8, 11, 12, 14, 22, 23]. The correlated explanatory variables are generated by the following method:where is the correlation between the explanatory variables and are generated using standard normal distribution. To observe the effect of correlation on the estimators, we use four different levels of correlations such as 0.80, 0.90, 0.95, and 0.99. The dispersion parameters are assumed to be 0.25, 0.5, and 0.75. The sample sizes are assumed to be n = 25, 50, 100, and 200. For more information about the simulation study, we suggest to see [8, 12, 14], among others.

3.1. Simulation Results

In this section, we compare the performance of the MLE and the GRR with different ridge parameter estimators under different factors, i.e., multicollinearity, sample size, number of explanatory variables, and dispersion.

In Tables 13, four different levels of multicollinearity are used. From these tables, we observe that, for fixed sample size, explanatory variables, and dispersions, the EMSE is increased with the increase in multicollinearity.

By comparing the performance of ridge parameter estimators in the GRR, we found that, for moderate multicollinearity, smaller dispersion,  = 2, and , the ridge parameter estimator is found to be better as compared to other considered ridge parameter estimators. For  = 2, severe multicollinearity, larger dispersion, and n = 25, the performance of the ridge parameter estimator is good compared to the other ridge parameter estimators. For  = 4, Tables 46 indicate that the performance of the ridge parameter estimators, i.e., , , , , and , seems to be better than the other ridge parameter estimators. From Tables 79, we observe that when  = 8, the performance of the ridge parameter estimators, i.e., , , and , is found to be better than the other ridge parameter estimators. From Tables 19, we observe that the EMSE decreases with the increase in sample sizes. Like multicollinearity, the number of explanatory variables also increases the EMSE’s of the estimators. When , the performance of most of the biasing parameter estimators, i.e., , and , is preferable as compared to others. For , the EMSE of , and is minimum compared to the other ridge parameter estimators. Similarly, for , the performance of , and is found to be well compared to the other considered ridge parameter estimators. The dispersion parameter also affects the performance of ridge parameter estimators and their EMSE’s. The performance of , and ridge parameter estimators is good among others when the dispersion parameter is small. Furthermore, when the dispersion parameter is moderate, then the performance of , and is quite well than others. Similarly, when the dispersion parameter is large, then the performance of , and is found to be well.

4. Applications

In this section, we will evaluate the performance of the adaptive ridge parameter estimators in the GRR model with the help of two real-life applications.

4.1. Reaction Rate Data

This dataset consists of 24 observations and is taken from [24]. In order to speed up the reaction rate, three factors (explanatory variables) were used during this reaction. These explanatory variables include partial pressure of hydrogen, partial pressure of n-pentane, and the third one is partial pressure of iso-pentane. To fit a proper regression model for this dataset, we have to know the probability distribution of dependent variable y on the basis of Anderson–Darling, Cramer–Von Mises, and Pearson Chi-square tests. Table 10 shows the test statistics and values for different probability distributions. These results show that this dataset is well fitted to the Gamma distribution as compared to other distributions given in Table 10. Therefore, we use GRM instead of any other regression models.

As there are three explanatory variables in this dataset, so, first we have to evaluate the multicollinearity among these explanatory variables. For this purpose, we consider the correlation matrix and the Condition Index (CI). The correlation of the explanatory variables is given in Table 11. This indicates that there is a high pairwise correlation among the explanatory variables. The CI for this dataset is 1795, which shows that for the given reaction rate dataset there is an issue of severe multicollinearity among the explanatory variables [2527]. To overcome this issue, we use GRRM instead of the GRM. The GRM estimates are computed using Equation (10), and GRR estimates are computed using Equation (12). Similarly, the EMSEs of the GRM and GRR estimators are computed using Equations (11) and (15), respectively. Table 12 shows the performance of ridge estimators along with the ML estimators, respectively. We found that the ridge parameter estimators have smaller EMSEs as compared to the other ridge parameter estimators. Of these best ridge parameter estimators, attains minimum EMSE among the considered ridge parameter estimators as well as the ML estimator.

4.2. Nitrogen Dioxide Data

This dataset is taken from Chatterjee and Hadi [28] and recently considered by Kurtoglu and Özkale [29] to deal the multicollinearity problem in the GRM. This dataset consist of n = 26 data values with one response variable (y) and  = 4 explanatory variables. The response variable represents the nitrogen dioxide concentration. The explanatory variables include  = mean wind speed in miles per hour (mph),  = maximum temperature,  = insolation (langleys per day), and  = stability factor (0F). The correlation matrix among the explanatory variables is given in Table 13, which shows a severe collinearity. Moreover, Kurtoglu and Özkale [29] also found that the explanatory variables are multicollinear because the CI of this dataset was 213.8097.

To deal with the multicollinearity, we use the GRRM instead of the MLE of the GRM. The GRM estimates are computed using Equation (10), and the GRR estimates are computed using Equation (12). Similarly, the EMSEs of the MLE and GRR estimators are computed using Equations (11) and (15), respectively. Table 14 shows the performance of ridge estimators.

We found that the ridge parameter estimators have smaller and approximately same EMSE values compared to the other ridge parameter estimators. Of these best ridge parameter estimators, attains minimum EMSE among the considered ridge parameter estimators as well as the ML estimator.

5. Concluding Remarks

The GRM is used when the response variable follows the Gamma distribution. The multicollinearity also affects the regression coefficients of the GRM. The GRR is applied to overcome the effect of multicollinearity. The ridge parameter plays a significant role in ridge regression estimation. So in this study, our focus is to choose the best ridge parameter estimator for the GRR. The EMSE is used as the performance evaluation criterion for choosing the best ridge parameter estimator. We adapted forty ridge parameter estimators for the GRR, and these estimators were proposed for different models except the GRR. We compared the performance of these ridge parameter estimators with the help of a simulation study and two real-life examples. The simulation results show that the EMSE of the ML and GRR estimators is increasing with the increase of multicollinearity and number of explanatory variables but decreasing with the increase of the dispersion parameter and sample size. When multicollinearity increases from low to moderate, then the overall performance of , , and is quite better as compared to other ridge parameter estimators. When multicollinearity increases from moderate to high, then the overall EMSE of the ridge parameter estimators, i.e., , and , is minimum as compared to other ridge parameter estimators. Similar results are also reported from the real-life applications except . On the basis of simulation and real-life applications results, we may suggest that these ridge parameter estimators perform well whenever the researchers want to apply the GRM with correlated explanatory variables.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request

Conflicts of Interest

The authors declare that they have no conflicts of interest.