Abstract
In the field of chemical data modeling, it is common to encounter response variables that are constrained to the interval (0, 1). In such cases, the beta regression model is often a more suitable choice for modeling. However, like any regression model, collinearity can present a significant challenge. To address this issue, the Liu-type estimator has been used as an alternative to the maximum likelihood estimator, but it suffers from bias. In this paper, we introduce the Jackknifed Liu-type estimator and its modified version, which demonstrate improved bias reduction compared to the original Liu-type estimator. We assess the theoretical and numerical performance of these estimators through Monte Carlo simulations and real-data examples from the field of chemistry. Our findings highlight the significant improvements offered by the proposed estimators in terms of accuracy and reliability.
1. Introduction
Regression models are widely employed in various fields, including chemometrics, for modeling data (see [1–4], for example). Different types of regression models, such as linear, generalized linear, nonlinear, and nonparametric regression models, have been introduced. However, selecting the appropriate model is crucial to obtain reliable and precise results. The nature and distribution of the response variable should be carefully considered when choosing a regression model.
In certain areas of research, the possible values of the response variable are limited to the interval , such as rates and proportions. To address this, the beta regression model (BRM) was introduced by Ferrari and Cribari-Neto [5]. However, the quality of parameter estimation has a significant impact on the use of BRMs. The maximum likelihood estimation (MLE) is commonly used for parameter estimation in BRMs, but if the independent variables are ill-conditioned, the results may be unsatisfactory. The variable selection methods, such as adjusted [6] or the swarm optimization method [7], can help to deal with this issue. However, this issue often necessitates the use of biased estimators, such as Stein-type estimators [8], ridge estimators [9, 10], modified ridge-type estimators [11], Liu estimators [12, 13], two-parameter estimators [14], Dawoud-Kibria estimators [15], and also Liu-type estimators [16, 17] which is of particular interest in this paper. Furthermore, there has been recent interest in a class of almost unbiased estimators for parameter estimation in regression models based on biased estimators. Notable examples of these estimators include the work of Ohtani [18], Amin et al. [19], Wu and Asar [20, 21], Varathan and Wijekoon [22], and Asar and Korkmaz [23]. On the other hand, the Jackknife approach has emerged as a viable method for reducing estimator bias. The Jackknife approach was initially developed by Quenouille [9] and Tukey [24] to significantly reduce the bias of estimators. Singh et al. [25] suggested an unbiased ridge estimator for linear regression models using this technique. Later, Batah et al. [26] suggested a modified Jackknifed ridge estimator in linear regression models and demonstrated its superiority over the generalized ridge estimator, Jackknifed ridge estimator, and LASSO [27]. In gamma regression models, Algamal [28, 29] employed the Jackknife technique to mitigate the bias of the ridge estimator. Yildiz [30] and Chaubey et al. [31] both presented Jackknifed Liu-type estimators and conducted theoretical and numerical analyses to explore their properties. In this paper, we apply the Jackknife technique to minimize the bias of the Liu-type estimator in BRMs.
The paper is organized as follows: Section 2 provides an overview of the BRM and determines the MLE of the parameters. In Section 3, the Jackknife procedure is applied to the Liu-type estimator and its modified version is introduced. The properties of the proposed estimators such as bias, covariance, and the means squared error are determined in Section 4. A theoretical comparison of the estimators is presented in Section 5, followed by a Monte Carlo simulation experiment in Section 6 to evaluate their performance. Section 7 showcases two applications of the proposed estimator in chemometrics. Finally, Section 8 presents the conclusions of the study.
2. Beta Regression Model
BRMs are commonly employed for analyzing data that is expressed as proportions and rates, such as migration rates and unemployment rates. These models rely on the fundamental assumption that the response variable follows ; e.g.,
The model, introduced by Ferrari and Cribari-Neto [5], assumes a constant precision parameter, , over observations. Let represent the response observations with the density defined in equation (1). The regression model can be expressed as follows:where and is the th observation of the independent variables and the link function, which maps into is a continuous and double differentiable function. To derive the MLE of parameters, , one can use the iterative re-weighted least-squares algorithm. Let , , , and , where denotes the digamma function. Therefore, the MLE in the BRM will be [9, 10]where
The values of and are evaluated at the final iteration. As increases, the distribution of approaches normal distribution with the mean vector and covariance matrix . As a consequence, the scalar mean squared error (MSE) of can be represented aswhere is the th eigenvalue of .
It should come as no surprise that the ill-condition of the matrix negatively impacts both the variance of MLE and the accuracy of parameter estimates. To overcome this issue in estimating parameters, Qasim et al. [12] and Abonazel and Taha [10] proposed the beta ridge estimator (BRE) as follows:
The beta Liu estimator (BLE) is proposed by Karlsson et al. [32] asand also, the beta Liu-type estimator (BLTE) is proposed by Algamal and Abonazel [16] aswhere and . The beta Liu-type estimator (BLTE) encompasses the beta ridge estimator (BRE) and beta Liu estimator (BLE) as specific cases. Algamal and Abonazel [16] demonstrated that the BLTE outperforms both of these estimators.
3. Suggested Estimators
The bias of BLTE is given by
The substantial bias exhibited by the BLTE is an undesirable characteristic for researchers. To mitigate this bias, the Jackknife approach was developed by Quenouille [9] and Tukey [24] to significantly reduce the bias of estimators. In this section, we will follow the approach of Singh et al. [25] and Batah et al. [26] to derive the Jackknifed form of the BLTE. In addition, we present a modified version of this estimator. If we delete the th observation, , from the data then
After some algebraic manipulations, we havewhere
Thus,where and . We consider weighted pseudovalues in the weighted Jackknife method as
Therefore, the weighted Jackknifed estimator will beand
By replacing (16) in (15), the beta Jackknifed Liu-type estimator (BJLTE) is given by
We also define a modified version of the BJLTE which is obtained by replacing instead of , that is
4. Properties of the Estimators
To simplify the analysis and understand the properties of the estimators, we derive their canonical form. Consider for as the eigenvalues of matrix such that , where and which the columns are the eigenvector of . So, the canonical forms can be used to express in the terms of and . Specifically, we denote the canonical form of BMLE as .
4.1. Properties of BLTE
The canonical form of BLTE of is given bywhere . The bias and covariance of will be
The matrix MSE (MMSE) and scalar MSE (SMSE) are given, respectively, by
4.2. Properties of BJLTE
The canonical form of BJLTE of is given bywhere . The bias and covariance of are as follows:
The MMSE and SMSE are given, respectively, by
4.3. Properties of BMJLTE
The canonical form of BMJLTE of is given bywhere . The bias and covariance of are given bywhere . The MMSE and SMSE are given, respectively, by
5. Theoretical Comparison of Estimators
In this section, we conduct a theoretical comparison between the proposed estimators and the BLTE, focusing on the squared bias (SB) and MMSE. We begin by evaluating the SB, which is defined as follows:
For comparison among the MMSE estimators, we need the following lemma.
Lemma 1. Let for be two estimators of with the covariance matrix and the bias vector , thenif and only ifwhere is a positive defined matrix [33].
5.1. Comparison between BLTE and BJLTE
Theorem 2. The SB of BJLTE is smaller than the SB of BLTE if
Proof. By using (20) and (24), we haveThis equation is positive if for then the proof is completed.
Theorem 3. When , the BMJLTE is superior to the BLTE in terms of MMSE if the following inequality holds.where .
Proof. By following Lemma 1, it is required to only show that is a defined positive matrix.Thus, will be positive if , for . Hence, the proof is completed.
5.2. Comparison between BLTE and BMJLTE
Theorem 4. The SB of BMJLTE is smaller than the SB of BLTE if for , we have or or .
Proof. By using (20) and (28), we haveThis equation is positive if for , be positive. This function has four roots , , , and . Since , the function is positive if or or for .
Theorem 5. When for , the BMJLTE superior the BLTE in terms of MMSE if the following inequality holdswhere .
Proof. By following Lemma 1, it is enough to show that is a defined positive matrix.Thus, will be positive if for , be positive. The discrimination of is ; therefore, we will have two following real roots:Thus, is positive if and the proof is finished.
5.3. Bias Parameter Selection
In the following subsection, we will derive an estimator for the parameter . In the search for this estimator, we consider the BJLTE. Therefore, the first derivation of the SMSE of BJLTE with respect to is calculated as follows:
This equation equals zero if which is a quadratic function of . This function leads to the following real roots:
Although there are various estimators for , we only recommend and utilize the following estimator in the simulation study.
6. Simulation Study
In this section, we will evaluate the performance of the proposed estimators in the BRM through a simulation study. By considering various values for parameter , the number of independent variables , and the precision parameter , we will present multiple potential outcomes that demonstrate the efficacy of the proposed estimator.
The observations of the covariates are generated bywhere are generated from standard normal distribution and is supplied to control the intensity of correlation among the covariates. The value of is considered to be 0.90, 0.095, and 0.99 to determine how the estimators are affected by varying degrees of collinearity. The values of coefficients are considered as such that . Finally, the observations of the response variable in the BRM with logit link function are generated from the beta distribution, , where
In order to determine the BLTE, we utilize the estimators described in [16] for and as follows:where
To evaluate the performance of the BJLTE and BMJLTE, we utilize the same for both estimators. For the parameter , we employ the estimator provided in equation (44). We also conclude the BRE and BLE estimators in our simulation study as well. For the ridge parameter, we use estimator in (47) and for the Liu estimator, we use the following estimator proposed by Karlsson et al. [32]:
In comparing the performance of the estimators, we specifically focus on the simulated MSE in addition to the squared bias. Therefore, we repeat the experiment 1000 times and calculate the criteria using the following formulas:where is the estimation of at the th repetition of simulation and is the mean of estimated values.
The MSE and SB of estimators are presented in Tables 1 and 2, respectively. Based on the information provided in the tables, the following conclusions can be drawn:(i)Based on Table 1, BMJLTE always outperforms other estimators in terms of MSE.(ii)In terms of MSE, BLE is worse than other estimators.(iii)The MSE of BRE is only less than BLE.(iv)The BJLTE exhibits better performance than the BLTE, BLE, and BRE based on the MSE values.(v)When the intensity of correlation increases, the MSE for all estimators increases.(vi)When the dispersion value changes from to , the MSE of all estimators reduces.(vii)The MSE of estimators tends to increase as the number of covariates increases.(viii)For a fixed value of , , and , increasing the sample size results in a decrease in the MSE for all estimators.(ix)Table 2 shows that when and 6, the BJLTE has the lowest SB among the estimators.(x)However, when and there is a high correlation , the BMJLTE has the lowest SB and for 0.90 and 0.95, the BJLTE still performs better.(xi)The BLE has the largest squared bias among the estimators in all scenarios.(xii)In general, as the sample size increases, the SB values decrease. However, in most scenarios, the SB values increase whenever the number of covariates or the intensity of correlation increases.(xiii)When the dispersion value changes from to 4, the SB values decrease.
7. Application to Real Data
In this section, two chemical datasets are utilized to demonstrate the performance of proposed estimators.
7.1. Gasoline Yield Data
The first dataset used in this study is sourced from Prater [2]. The objective is to investigate the impact of several covariates on a response variable, which represents the percentage of crude oil converted to gasoline through the process of distillation and fractionation. Initially, this dataset was analyzed by [34] using a linear regression model. However, it was discovered that the error term distribution was not symmetrical. As a result, the data were transformed to ensure that the dependent variable took values along the real number line. Lemonte et al. [35] adopted an alternative approach to address this issue. They used the beta distribution for analysis and found that it provided more robust outcomes for influential observations compared to the method employed by Atkinson [34]. The covariates in this dataset are: the crude oil gravity , the vapor pressure of the crude oil , the temperature at which 10 percent of crude oil has vaporized , and finally the temperature at which all the gasoline is vaporized . Qasim et al. [12] also used this dataset to illustrate the application of ridge beta regression.
The left-side plot in Figure 1 reveals a significant correlation among the variables, particularly between and . Furthermore, we compute the condition index of matrix as 10613.01. Both the correlation plot and CI indicate the presence of multicollinearity in the dataset. Therefore, we apply the proposed estimators described in this paper. To evaluate the performance of the proposed estimators, we employ the bootstrapping method with sample size and 1000 bootstrap iterations. We consider the BMLE as the true value of the coefficient vector and compute the mean squared error (MSE) and the mean absolute error (MAE) of various estimators bywhere is the maximum likelihood estimation and the is one of the considered estimators of , e.g., BRE, BLE, BLTE, BJLTE, and BMJLTE, in the th bootstrap replication. The results in Table 3 show that the Jackknifed estimators have significantly lower MSE and MAE values compared to BLTE, BLE, and BRE. Specifically, the BMJLTE outperforms the other estimators in terms of MSE, and BJLTE outperforms the other estimators in terms of MAE, which is consistent with the simulation study results. Furthermore, Table 4 reports the estimation of coefficients using the mean of the bootstrap estimators.

7.2. Heat Treating Test Data
The second dataset considered in this study is the heat-treating test data obtained from [36]. It comprises five covariates: furnace temperature , carbon concentration and duration of the carburizing cycle (soakpct and soaktime) denoted as and , and carbon concentration and duration of the defuse time (Difftime and Diffpct) indicated as and . The response variable captures the quality of a sound determined by the rate of vibrations or the level of something which is referred to as PITCH in the dataset and denotes the product presentation to the customer’s heart. Since the response variable follows a ratio form, we employ the beta regression model to analyze it. However, before applying the beta regression model, we verify whether the values of y follow a beta distribution. We conduct an Anderson–Darling (AD) test using the ad test function from the goftest package in the R programming language. The computed test statistic is 0.85967, with a p value of 0.439. The estimated parameter values are 4.9995 and 182.1799. The p value suggests that the beta distribution is suitable for modeling the response variable.
We fit a model by including an intercept term and compute the condition index of the matrix , which results in a value of 303772.7. The correlation matrix of the covariates is displayed in the right-side plot of Figure 1. Both observations indicate the presence of multicollinearity in the dataset. Consequently, we apply the proposed estimators to this dataset as well. To evaluate the efficiency of the proposed estimators, similar to the first dataset, we employ the bootstrapping method with a sample size of and 1000 bootstrap iterations.
We calculate the MSE and MAE of each estimator by using (51) and (52), respectively. Based on Table 5, the Jackknifed estimators are superior to the BLTE, BLE, and BRE due to having the lowest value of MSE and MAE. The MSE of the BMJLTE is lower than that of BJLTE but for the MAE, it is the opposite. The estimation of the coefficients by using the proposed estimators is presented in Table 6 which is obtained by using the mean of the estimation for all bootstrap estimations.
8. Conclusion
In this paper, we have addressed the bias issue in the Liu-type estimator used in BRMs. By applying the Jackknife methodology, we were able to reduce the bias of the beta Liu-type estimator and introduce a modified estimator. We have analytically established the conditions under which both proposed estimators outperform the beta Liu-type estimator. To evaluate the performance of the proposed estimators and compare them to the BMLE, BRE, BLE, and BLTE, we conducted a comprehensive simulation study. The simulation experiment considered various aspects to observe the behavior of the proposed estimators. The results indicate that the proposed estimators, especially the modified estimator, outperformed the BRE, BLE, and BLTE in terms of MSE and squared bias (SB). Furthermore, we have demonstrated the efficiency of the proposed estimators through two real-life examples in the field of chemometrics. In both cases, the proposed Jackknifed estimators exhibited smaller MSE and MAE compared to the alternative estimators. Based on these findings, we recommend researchers utilize the proposed estimators, especially the modified Jackknifed Liu-type estimator whenever multicollinearity is present in BRMs. The proposed estimators offer improved performance in terms of bias reduction and estimation accuracy.
Data Availability
The data supporting this paper are from previously reported studies and datasets, which have been cited.
Conflicts of Interest
The authors declare that they have no conflicts of interest.