Abstract
Among several variable selection methods, LASSO is the most desirable estimation procedure for handling regularization and variable selection simultaneously in the high-dimensional linear regression models when multicollinearity exists among the predictor variables. Since LASSO is unstable under high multicollinearity, the elastic-net (Enet) estimator has been used to overcome this issue. According to the literature, the estimation of regression parameters can be improved by adding prior information about regression coefficients to the model, which is available in the form of exact or stochastic linear restrictions. In this article, we proposed a stochastic restricted LASSO-type estimator (SRLASSO) by incorporating stochastic linear restrictions. Furthermore, we compared the performance of SRLASSO with LASSO and Enet in root mean square error (RMSE) criterion and mean absolute prediction error (MAPE) criterion based on a Monte Carlo simulation study. Finally, a real-world example was used to demonstrate the performance of SRLASSO.
1. Introduction
Let us consider the linear regression modelwhere y is the n × 1 vector of observations on the dependent variable, X is the n × p matrix of observations on the nonstochastic predictor variables, β is a p × 1 vectors of unknown coefficients, and ϵ is the n × 1 vector of random error terms, which is independent and identically normally distributed with the mean zero and common variance σ2, that is, E(ϵ) = 0 and E(ϵϵ′) = Ω = σ2I.
It is well-known that ordinary least square estimator (OLSE) is a best linear unbiased estimator for model (1), and it is defined as
Furthermore, researchers [1, 2] have shown that parameter estimation is improved when prior information for regression coefficients is available, which can be in the form of exact linear restrictions or stochastic linear restrictions. Let us assume that there exists prior information on β in the form of stochastic linear restriction aswhere ϕ is the q × 1 vector, R is the q × p matrix with rank q, and v is the q × 1 vector of disturbances, such that E(v) = 0, D(v) = E(vv′) = Ψ = σ2W (W is positive definite) and E(vϵ′) = 0. Note that equation (3) will be the exact linear restriction when v = 0.
Theil and Goldberger [2] proposed the mixed regression estimator (MRE) by combining the models (1) and (3), and it is defined as
According to the literature, it was noted that OLSE and MRE are unstable when the number of predictors is high. In this case, the variable selection methods such as forward selection, backward selection, and step-wise selection have been used. However, these methods are also unstable when multicollinearity exists among predictor variables. As a remedial solution to this problem, Tibshirani [3] proposed the least absolute shrinkage and selection operator (LASSO) by considering the model (1) to handle both multicollinearity and variable selection simultaneously in the high-dimensional linear regression model. The LASSO estimator is defined aswhere t ≥ 0 is a turning parameter. The solutions of LASSO can be obtained using either the standard quadratic programming technique or least angle regression (LARS) [4] algorithm. According to Zou et al. [5, 6], LASSO is unstable when high multicollinearity exists among predictor variables. Therefore, they proposed elastic-net (Enet) estimator as an alternative to LASSO to handle this issue. The Enet estimator is defined as
The Enet solutions can be obtained using LARS-EN algorithm, which is the modified version of LARS algorithm.
Norouzirad et al. [7] and Tuaç and Arslan [8] attempted to combine LASSO with the exact linear restriction, and their work did not well define how the exact restriction is incorporated since there is no analytical solution for LASSO. In this article, we proposed stochastic restricted LASSO-type estimator (SRLASSO) by combining LASSO and stochastic restrictions. Furthermore, we compared the performance of SRLASSO with LASSO and Enet in root mean square error (RMSE) criterion and mean absolute prediction error (MAPE) criterion using a Monte Carlo simulation study and a real-world example. The structure of the rest of the article is as follows: Section 2 describes SRLASSO and algorithm to find SRLASSO solutions, Section 3 shows the performance of SRLASSO, and Section 4 concludes the article and references provided at the end of the paper.
2. Stochastic Restricted LASSO-Type Estimator (SRLASSO)
By considering equation (3) as an additional constrain, we define stochastic restricted LASSO-type estimator (SRLASSO) as
We can view this as a quadratic optimization problem. Here, we have 2p + q constraints. However, this method is not suitable in a practical situation if p is large. So, we propose a stochastic restricted LARS (SRLARS) algorithm, which is the modified version of the LARS algorithm, to find SRLASSO solutions. In SRLARS, we consolidate MRE with LARS.
2.1. Stochastic Restricted LARS (SRLARS)
Standardize the predictor variables X to have a mean of zero and a standard deviation of one, and response variable y to have a mean zero. Let residuals of the models (1) and (3) be and , respectively.
Step 1. Start with , r0 = y, and τ0 = ϕ.
Step 2. Find the predictor Xj1 most correlated with r0 as follows:Let be the regression coefficient of Xji. Then, increase the estimate of from 0 toward MRE until any other predictor Xj2 has a high correlation with the current residual as Xj1 does. At this point, SRLARS proceeds in the equiangular direction between the two predictors Xj1 and Xj2 instead of continuing in the direction based on Xj1.
In a similar way, ith variable Xji eventually earns its way into the active set, and then SRLARS proceeds in the equiangular direction between Xj1, Xj2, …, Xji. Continue adding variables to the active set in this way moving in the direction defined by the least angle direction. In the intermediate steps, the coefficient estimates are updated using the following formula:where αi is a value between 0 and 1 which represents how far the estimate moves in the direction before another variable enters the model and the direction changes again, and ui is the equiangular vector.
The direction ui is calculated using the following formula based on MRE:where Ei is the matrix with column (ej1, ej2, …, eji) and ej be the jth standard unit vector in , which has the index of variables selected in each subsequent step.
Then, αi is calculated as follows:wherefor any j such that andfor any j such that .
Step 3. If , then Ei is the matrix formed by removing the column ej from Ei−1. Then ri and τi related to the current step is calculated asand then move to the next step where ji+1 is the value of j such that or or .
Step 4. Proceed Step 2 until αi = 1.
2.2. Properties of SRLARS
SRLARS algorithm sequentially updates the SRLASSO estimates. It requires O(m3 + pm2) operations, where m is the number of steps. The prediction performance of SRLARS is evaluated using the RMSE criterion and MAPE criterion, which is described in Section 3. According to Efron et al. [4], the conventional tuning parameter is , and the suitable value of turning parameter t for the particular problem is selected using K-fold cross-validation.
2.3. Selection of Prior Information
According to Nagar and Kakwani [9], we can define the prior information as follows: Let β1 be a vector of some selected q elements of β and β2 be the rest of elements. Assume that b is the known unbiased estimates of β1. By using the “two sigma rule,” now we can write the range of β1 as b ± 2SE(b). Based on that, we can set the expressions of equation (3) as , , , and .
3. Performance of SRLASSO
SRLASSO is compared with LASSO and Enet using the RMSE criterion and MAPE criterion, which are the expected prediction errors of the algorithms, and are defined aswhere (ynew, Xnew) denotes the new data which are not used to obtain the parameter estimates, n is the number of new observations, and is the estimated value of β using the respective algorithm. A Monte Carlo simulation study and a real-world example are used for the comparison.
3.1. Simulation Study
According to McDonald and Galarneau [10], first we generate the predictor variables by using the following formula:where zi,j is an independent standard normal pseudo random number and ρ is the theoretical correlation between any two explanatory variables.
In this study, we have used a linear regression model of 100 observations and 20 predictors. A dependent variable is generated by using the following equation:where ϵi is a normal pseudo random number with a mean zero and common variance σ2.
We choose β = (β1, β2, …, β20) as the normalized eigenvector corresponding to the largest eigenvalue of X′X for which β′β = 1. To define the prior information according to Section 2.3, we assume that OLSE estimates of the first four elements of β are unbiased, which are the estimates of b. To investigate the effects of different degrees of multicollinearity on the estimators, we choose ρ = (0.5, 0.7, 0.9), which represents weak, moderated, and high multicollinearity. For the analysis, we have simulated 50 data sets consisting of 50 observations to fit the model and 50 observations to calculate the RMSE and MAPE. The cross-validated RMSE and MAPE of the estimators are displayed in Figure 1 and Figure 2, respectively. The median cross-validated RMSE and MAPE of the estimators are displayed in Table 1.

(a)

(b)

(c)

(a)

(b)

(c)
From Figures 1 and 2 and Table 1, we can observe that SRLASSO always shows better performance compared to LASSO and Enet in both RMSE criterion and MAPE criterion under all degrees of multicollinearity.
3.2. Real-World Example
As a numerical example, the well-known Prostate Cancer Data [11] was used to compare the performance of SRLASSO. This data set is attached with “lasso2” R package. In the Prostate Cancer Data, the predictors are the following eight clinical measures: log cancer volume (lcavol), log prostate weight (lweight), age, log of the amount of benign prostatic hyperplasia (lbph), seminal vesicle invasion (svi), log capsular penetration (lcp), Gleason score (gleason), and percentage Gleason score 4 or 5 (pgg45). The response is the log of prostate specific antigen (lpsa), and the dataset has 97 observations. The variance inflation factor (VIF) values of the predictor variables of the dataset are 3.09, 2.97, 2.47, 2.05, 1.95, 1.37, 1.36, and 1.32, and the condition number is 243, which shows evidence of multicollinearity among the predictor variables. Stamey et al. [11] have examined the correlation between the level of prostate specific antigen with those eight clinical measures. Furthermore, Tibshirani [3] and Tibshirani et al. [4] have used this data to examine the performance of LASSO and LARS algorithm, respectively. We have used 67 observations to fit the model and 30 observations to calculate RMSE and MAPE. We assume that OLSE estimates of the first three regression coefficients of Prostate Cancer Data are unbiased and we defined the prior information for this data based on Section 2.3. The cross-validated RMSE and MAPE of the estimators are displayed in Table 2, and coefficient paths of each estimator are displayed in Figure 3.

(a)

(b)

(c)
From Table 2, we can observe that SRLASSO outperforms LASSO and Enet on Prostate Cancer Data in both RMSE criterion and MAPE criterion. Furthermore, we can note that the selection of variables is different for each estimator by comparing Figures 3(a)–3(c).
4. Conclusions
This study clearly showed that SRLASSO does a better performance than LASSO and Enet in both RMSE criterion and MAPE criterion when multicollinearity exists among the predictor variables. Therefore, SRLASSO can be used as an alternative estimator of LASSO and Enet if prior information is accessible on the regression coefficients. The proposed SRLARS algorithm can be used to obtain SRLASSO solutions.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.