Stochastic Restricted LASSO-Type Estimator in the Linear Regression Model

Kayanan, Manickavasagar; Wijekoon, Pushpakanthie

doi:https://doi.org/10.1155/2020/7352097

Journal of Probability and Statistics

On this page

Abstract Introduction Conclusions Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2020 | Article ID 7352097 | https://doi.org/10.1155/2020/7352097

Stochastic Restricted LASSO-Type Estimator in the Linear Regression Model

Manickavasagar Kayanan^1,2and Pushpakanthie Wijekoon³

Academic Editor: Yaozhong Hu

Received13 Nov 2019

Revised09 Feb 2020

Accepted29 Feb 2020

Published30 Mar 2020

Abstract

Among several variable selection methods, LASSO is the most desirable estimation procedure for handling regularization and variable selection simultaneously in the high-dimensional linear regression models when multicollinearity exists among the predictor variables. Since LASSO is unstable under high multicollinearity, the elastic-net (Enet) estimator has been used to overcome this issue. According to the literature, the estimation of regression parameters can be improved by adding prior information about regression coefficients to the model, which is available in the form of exact or stochastic linear restrictions. In this article, we proposed a stochastic restricted LASSO-type estimator (SRLASSO) by incorporating stochastic linear restrictions. Furthermore, we compared the performance of SRLASSO with LASSO and Enet in root mean square error (RMSE) criterion and mean absolute prediction error (MAPE) criterion based on a Monte Carlo simulation study. Finally, a real-world example was used to demonstrate the performance of SRLASSO.

1. Introduction

Let us consider the linear regression modelwhere y is the n × 1 vector of observations on the dependent variable, X is the n × p matrix of observations on the nonstochastic predictor variables, β is a p × 1 vectors of unknown coefficients, and ϵ is the n × 1 vector of random error terms, which is independent and identically normally distributed with the mean zero and common variance σ², that is, E(ϵ) = 0 and E(ϵϵ′) = Ω = σ²I.

It is well-known that ordinary least square estimator (OLSE) is a best linear unbiased estimator for model (1), and it is defined as

Furthermore, researchers [1, 2] have shown that parameter estimation is improved when prior information for regression coefficients is available, which can be in the form of exact linear restrictions or stochastic linear restrictions. Let us assume that there exists prior information on β in the form of stochastic linear restriction aswhere ϕ is the q × 1 vector, R is the q × p matrix with rank q, and v is the q × 1 vector of disturbances, such that E(v) = 0, D(v) = E(vv′) = Ψ = σ²W (W is positive definite) and E(vϵ′) = 0. Note that equation (3) will be the exact linear restriction when v = 0.

Theil and Goldberger [2] proposed the mixed regression estimator (MRE) by combining the models (1) and (3), and it is defined as

According to the literature, it was noted that OLSE and MRE are unstable when the number of predictors is high. In this case, the variable selection methods such as forward selection, backward selection, and step-wise selection have been used. However, these methods are also unstable when multicollinearity exists among predictor variables. As a remedial solution to this problem, Tibshirani [3] proposed the least absolute shrinkage and selection operator (LASSO) by considering the model (1) to handle both multicollinearity and variable selection simultaneously in the high-dimensional linear regression model. The LASSO estimator is defined aswhere t ≥ 0 is a turning parameter. The solutions of LASSO can be obtained using either the standard quadratic programming technique or least angle regression (LARS) [4] algorithm. According to Zou et al. [5, 6], LASSO is unstable when high multicollinearity exists among predictor variables. Therefore, they proposed elastic-net (Enet) estimator as an alternative to LASSO to handle this issue. The Enet estimator is defined as

The Enet solutions can be obtained using LARS-EN algorithm, which is the modified version of LARS algorithm.

Norouzirad et al. [7] and Tuaç and Arslan [8] attempted to combine LASSO with the exact linear restriction, and their work did not well define how the exact restriction is incorporated since there is no analytical solution for LASSO. In this article, we proposed stochastic restricted LASSO-type estimator (SRLASSO) by combining LASSO and stochastic restrictions. Furthermore, we compared the performance of SRLASSO with LASSO and Enet in root mean square error (RMSE) criterion and mean absolute prediction error (MAPE) criterion using a Monte Carlo simulation study and a real-world example. The structure of the rest of the article is as follows: Section 2 describes SRLASSO and algorithm to find SRLASSO solutions, Section 3 shows the performance of SRLASSO, and Section 4 concludes the article and references provided at the end of the paper.

2. Stochastic Restricted LASSO-Type Estimator (SRLASSO)

By considering equation (3) as an additional constrain, we define stochastic restricted LASSO-type estimator (SRLASSO) as

We can view this as a quadratic optimization problem. Here, we have 2^p + q constraints. However, this method is not suitable in a practical situation if p is large. So, we propose a stochastic restricted LARS (SRLARS) algorithm, which is the modified version of the LARS algorithm, to find SRLASSO solutions. In SRLARS, we consolidate MRE with LARS.

2.1. Stochastic Restricted LARS (SRLARS)

Standardize the predictor variables X to have a mean of zero and a standard deviation of one, and response variable y to have a mean zero. Let residuals of the models (1) and (3) be and , respectively.

Step 1. Start with , r₀ = y, and τ₀ = ϕ.

Step 2. Find the predictor X_j1 most correlated with r₀ as follows:Let be the regression coefficient of X_ji. Then, increase the estimate of from 0 toward MRE until any other predictor X_j2 has a high correlation with the current residual as X_j1 does. At this point, SRLARS proceeds in the equiangular direction between the two predictors X_j1 and X_j2 instead of continuing in the direction based on X_j1.
In a similar way, i^th variable X_ji eventually earns its way into the active set, and then SRLARS proceeds in the equiangular direction between X_j1, X_j2, …, X_ji. Continue adding variables to the active set in this way moving in the direction defined by the least angle direction. In the intermediate steps, the coefficient estimates are updated using the following formula:where α_i is a value between 0 and 1 which represents how far the estimate moves in the direction before another variable enters the model and the direction changes again, and u_i is the equiangular vector.
The direction u_i is calculated using the following formula based on MRE:where E_i is the matrix with column (e_j1, e_j2, …, e_ji) and e_j be the j^th standard unit vector in , which has the index of variables selected in each subsequent step.
Then, α_i is calculated as follows:wherefor any j such that andfor any j such that .

Step 3. If , then E_i is the matrix formed by removing the column e_j from E_i−1. Then r_i and τ_i related to the current step is calculated asand then move to the next step where j_i+1 is the value of j such that or or .

Step 4. Proceed Step 2 until α_i = 1.

2.2. Properties of SRLARS

SRLARS algorithm sequentially updates the SRLASSO estimates. It requires O(m³ + pm²) operations, where m is the number of steps. The prediction performance of SRLARS is evaluated using the RMSE criterion and MAPE criterion, which is described in Section 3. According to Efron et al. [4], the conventional tuning parameter is , and the suitable value of turning parameter t for the particular problem is selected using K-fold cross-validation.

2.3. Selection of Prior Information

According to Nagar and Kakwani [9], we can define the prior information as follows: Let β₁ be a vector of some selected q elements of β and β₂ be the rest of elements. Assume that b is the known unbiased estimates of β₁. By using the “two sigma rule,” now we can write the range of β₁ as b ± 2SE(b). Based on that, we can set the expressions of equation (3) as , , , and .

3. Performance of SRLASSO

SRLASSO is compared with LASSO and Enet using the RMSE criterion and MAPE criterion, which are the expected prediction errors of the algorithms, and are defined aswhere (y_new, X_new) denotes the new data which are not used to obtain the parameter estimates, n is the number of new observations, and is the estimated value of β using the respective algorithm. A Monte Carlo simulation study and a real-world example are used for the comparison.

3.1. Simulation Study

According to McDonald and Galarneau [10], first we generate the predictor variables by using the following formula:where z_i,j is an independent standard normal pseudo random number and ρ is the theoretical correlation between any two explanatory variables.

In this study, we have used a linear regression model of 100 observations and 20 predictors. A dependent variable is generated by using the following equation:where ϵ_i is a normal pseudo random number with a mean zero and common variance σ².

We choose β = (β₁, β₂, …, β₂₀) as the normalized eigenvector corresponding to the largest eigenvalue of X′X for which β′β = 1. To define the prior information according to Section 2.3, we assume that OLSE estimates of the first four elements of β are unbiased, which are the estimates of b. To investigate the effects of different degrees of multicollinearity on the estimators, we choose ρ = (0.5, 0.7, 0.9), which represents weak, moderated, and high multicollinearity. For the analysis, we have simulated 50 data sets consisting of 50 observations to fit the model and 50 observations to calculate the RMSE and MAPE. The cross-validated RMSE and MAPE of the estimators are displayed in Figure 1 and Figure 2, respectively. The median cross-validated RMSE and MAPE of the estimators are displayed in Table 1.

(a)

(b)

(c)

(a)

(b)

(c)

From Figures 1 and 2 and Table 1, we can observe that SRLASSO always shows better performance compared to LASSO and Enet in both RMSE criterion and MAPE criterion under all degrees of multicollinearity.

3.2. Real-World Example

As a numerical example, the well-known Prostate Cancer Data [11] was used to compare the performance of SRLASSO. This data set is attached with “lasso2” R package. In the Prostate Cancer Data, the predictors are the following eight clinical measures: log cancer volume (lcavol), log prostate weight (lweight), age, log of the amount of benign prostatic hyperplasia (lbph), seminal vesicle invasion (svi), log capsular penetration (lcp), Gleason score (gleason), and percentage Gleason score 4 or 5 (pgg45). The response is the log of prostate specific antigen (lpsa), and the dataset has 97 observations. The variance inflation factor (VIF) values of the predictor variables of the dataset are 3.09, 2.97, 2.47, 2.05, 1.95, 1.37, 1.36, and 1.32, and the condition number is 243, which shows evidence of multicollinearity among the predictor variables. Stamey et al. [11] have examined the correlation between the level of prostate specific antigen with those eight clinical measures. Furthermore, Tibshirani [3] and Tibshirani et al. [4] have used this data to examine the performance of LASSO and LARS algorithm, respectively. We have used 67 observations to fit the model and 30 observations to calculate RMSE and MAPE. We assume that OLSE estimates of the first three regression coefficients of Prostate Cancer Data are unbiased and we defined the prior information for this data based on Section 2.3. The cross-validated RMSE and MAPE of the estimators are displayed in Table 2, and coefficient paths of each estimator are displayed in Figure 3.

(a)

(b)

(c)

From Table 2, we can observe that SRLASSO outperforms LASSO and Enet on Prostate Cancer Data in both RMSE criterion and MAPE criterion. Furthermore, we can note that the selection of variables is different for each estimator by comparing Figures 3(a)–3(c).

4. Conclusions

This study clearly showed that SRLASSO does a better performance than LASSO and Enet in both RMSE criterion and MAPE criterion when multicollinearity exists among the predictor variables. Therefore, SRLASSO can be used as an alternative estimator of LASSO and Enet if prior information is accessible on the regression coefficients. The proposed SRLARS algorithm can be used to obtain SRLASSO solutions.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

J. Durbin, “A note on regression when there is extraneous information about one of the coefficients,” Journal of the American Statistical Association, vol. 48, no. 264, pp. 799–808, 1953.
View at: Publisher Site | Google Scholar
H. Theil and A. S. Goldberger, “On pure and mixed statistical estimation in economics,” International Economic Review, vol. 2, no. 1, pp. 65–78, 1961.
View at: Publisher Site | Google Scholar
R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.
View at: Publisher Site | Google Scholar
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” The Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004.
View at: Publisher Site | Google Scholar
H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005.
View at: Publisher Site | Google Scholar
Z. Y. Algamal, M. H. Lee, A. M. Al-Fakih, and M. Aziz, “High-dimensional QSAR prediction of anticancer potency of imidazo 4, 5-b pyridine derivatives using adjusted adaptive LASSO,” Journal of Chemometrics, vol. 29, no. 10, pp. 547–556, 2015.
View at: Publisher Site | Google Scholar
M. Norouzirad, M. Arashi, and A. K. Saleh, “Restricted lASSO and double shrinking,” 2015, https://arxiv.org/abs/1505.02913.
View at: Google Scholar
Y. Tuaç and O. Arslan, “Variable selection in restricted linear regression models,” 2017, https://arxiv.org/abs/1710.04105.
View at: Google Scholar
A. L. Nagar and N. C. Kakwani, “The bias and moment matrix of a mixed regression estimator,” Econometrica, vol. 32, no. 1/2, pp. 174–182, 1964.
View at: Publisher Site | Google Scholar
G. C. McDonald and D. I. Galarneau, “A Monte Carlo evaluation of some ridge-type estimators,” Journal of the American Statistical Association, vol. 70, no. 350, pp. 407–416, 1975.
View at: Publisher Site | Google Scholar
T. A. Stamey, J. N. Kabalin, J. E. McNeal et al., “Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients,” Journal of Urology, vol. 141, no. 5, pp. 1076–1083, 1989.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2020 Manickavasagar Kayanan and Pushpakanthie Wijekoon. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies