Abstract

This paper studies the variable selection of high-dimensional spatial autoregressive panel models with fixed effects in which a matrix transformation method is applied to eliminate the fixed effects. Then, a penalized quasi-maximum likelihood is developed for variable selection and parameter estimation in the transformed panel model. Under some regular conditions, the consistency and oracle properties of the proposed estimator are established. Some Monte-Carlo experiments and a real data analysis are conducted to examine the finite sample performance of the proposed variable selection procedure, showing that the proposed variable selection method works satisfactorily.

1. Introduction

Panel data, which contains dimensions of time and space, are becoming more and more common under the circumstances of big data. Compared with traditional cross-sectional data and time series data, a great advantage of panel data is that it can effectively expand the sample size. A large number of studies have given a variety of panel models, of which the panel model with fixed effects is widely studied and applied because it can capture non-time-varying and unobservable exogenous variables. In fact, these models are still based on the independence assumption which is improper in the real situation, so in this paper, we consider spatial autoregressive panel (SARP) models with fixed effects that permit interdependence between spatial units in panel data. Most data in the context of big data are high-dimensional, which also means the number of covariates can diverge with the growth of sample size and lead to a rapid increase in model complexity. In order to reduce the amount of calculation and model complexity, penalized methods are indispensable to remove irrelevant variables. However, the variable selection of high-dimensional spatial panel data is more complex than that of high-dimensional cross-sectional data due to spatial terms and fixed effects. The ordinary penalized methods for variable selection, such as the least absolute shrinkage and selection operator (LASSO) (Tibshirani [1]), the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li [2]), which are proposed for classical linear regression models, cannot be used in the high-dimensional SARP models directly. Therefore, we propose a penalized estimation method for high-dimensional SARP models with fixed effects and establish corresponding asymptotic properties.

Up to now, there are a large number of related studies which have given statistical inferences about spatial models. Since Cliff and Ord [3] proposed a structure with spatial correlation, spatial models have been receiving increasing attention. Anselin [4] proposed the maximum likelihood estimation of the spatial autoregressive (SAR) model and constructed an LM test for the spatial term. Kelejian and Prucha [5] proposed a generalized spatial two-stage least squares procedure for instrument matrices and studied its properties. Lee [6] established the asymptotic distributions of quasi-maximum likelihood for SAR models. Wei et al. [7] constructed the partially linear varying coefficient SAR models and approximated the nonlinear part locally by a linear function. Du et al. [8] proposed the estimator for the asymptotic covariance matrix of the parameter estimator of partially linear additive SAR models and established the asymptotic properties for the resulting estimators. Other research results on SAR models can also be referred to Cheng et al. [9], Dai et al. [10], Gupta and Robinson [11], Lin and Lee [12], Tian et al. [13], Tian et al. [14], and so on. These studies based on cross-sectional data are not applicable to panel data, and variable selection is rarely involved.

Recently, panel data have attracted tremendous attention, especially since a number of works have studied the relevant statistical inference of spatial panel data models. For example, Baltagi et al. [15] considered panel regression models with SAR disturbances and LM tests under five hypotheses. Lee and Yu [16] proposed an orthonormal transformation for spatial autoregressive panel models with fixed effects and provided a method which allows the estimation of coefficients without estimating fixed effects. Ju et al. [17] estimated parameters of spatial dynamic panel data models by the Bayesian method, and their method can adapt to a skew-normal distribution. These research studies investigated the estimators and corresponding large sample properties of spatial panel models; however, little work has been performed on the variable selection of models.

To the best of our knowledge, Liu et al. [18] investigated variable selection in the SAR model with independent and identically distributed errors, but their model was not under the situation of a diverging number of parameters and the asymptotic properties they established were not available for high-dimensional data. Xie et al. [19] considered the penalized estimation for SAR models with a diverging number of parameters and established the oracle properties; however, their method was available for high-dimensional cross-sectional data but not for panel data. Therefore, we consider variable selection for the high-dimensional SARP model with fixed effect, present the penalized estimators, and establish related asymptotic properties thoroughly.

This paper is organized as follows: in Section 2, we introduce a high-dimensional SARP model with fixed effect and eliminate the fixed effects term by transformation matrix. In Section 3, we consider the penalized quasi-maximum likelihood estimators (QMLE) which are based on the SCAD penalty function for SARP models with fixed effects and establish its consistency and oracle property. Besides, we introduce a feasible iterative algorithm for the penalized QMLE in this section. In Section 4, some Monte-Carlo simulations are carried out to examine the finite sample performance of QMLE. In Section 5, a real data application of China’s carbon emission is provided for illustrative purposes. In Section 6, we give a brief conclusion of this paper. The detailed proofs of theoretical results are provided in the Appendix.

2. Matrix Transformation for SARP Models with Fixed Effects

We consider the following SARP models with fixed effects:where is an vector of observations on the dependent variables, is an unknown spatial autoregressive coefficient, the spatial weight matrix is an matrix of known constants with zero diagonal elements and satisfies that the sum of rows is 1. is an matrix of observations on linear regressors, where is divergent as , is an unknown vector of regression coefficients, is an unknown vector of fixed effects, and is a vector i.i.d across t with zero mean and finite covariance matrix , where is an unknown parameter and is the identity matrix. Therefore, the unknown parameters to be estimated can be expressed as .

The fixed effects is an unknown vector which means that there is inconsistency regarding ; however, if we just focus on and , a transformation method is available to eliminate . According to the method used by Lee and Yu [16], we let , where represents a T-dimensional vector with all 1 and let be the orthonormal eigenvector matrix of which corresponds to the eigenvalues of one. Thus, we define the transformed matrices , , and apparently . For , we similarly define . Then, (1) can be rewritten as follows:where , , is Kronecker’s product symbol, , , and . Then, it is easy to know , so we can obtain . Besides, , , and the parameter vector to be estimated is marked as .

3. Methods and Large Sample Properties

3.1. Penalized Quasimaximum Likelihood Estimator

Let , and if is normally distributed, the log-likelihood function of model (2) is obtained in the following equation:

In addition, we let be the SCAD penalty function, and of course, other penalty functions can also be considered here. Then, we renumber parameter vector as , thus the penalized quasimaximum likelihood function can be obtained as follows:where the SCAD penalty function is defined by its first derivative:

We set as Fan and Li [2] recommended in their paper. To determine the tuning parameter , a Bayesian information criterion can be applied:where is the number of nonzero parameters, and . Then, we get the penalized QMLE .

3.2. Asymptotic Properties

Let be the true parameter vector, without loss of generality, we assume that the first of is nonzero parameters and zero for the remainder. Remark all the nonzero parameters as , then we can rewrite as . Let , in order to obtain the asymptotic properties of the penalized QMLE, there are some regular assumptions as follows:A1. The is finite and .A2. The moment exists for some .A3. The elements in satisfy , where can be divergent or bounded, when .A4. The matrix is a nonsingular matrix for all in .A5. The sequences of matrices and are uniformly bounded in both row and column sums for all .A6. The exists and is nonsingular. The exists. The elements of are uniformly bounded constants for all .A7. is bounded in both row and column sums, uniformly in in a closed subset of and which is an interior point of .A8. The exists and is nonsingular.A9. The exists and is nonsingular.A10. The third derivatives exist for all in an open set that contains true parameter point . There exist functions that satisfy , where for and .A11. The eigenvalues of the Hessian matrix satisfy the following:A12. for .A13. , as .

Remark 1. Assumptions A1–A8 are set for spatial term and regressors, A9–A11 are imposed on the likelihood functions, and A12 and A13 are for penalty functions. In addition, A1 shows that is the only large sample scenario; A2 ensures the moment exists; A3–A8 ensure the QMLE of SARP exists; and A9–A13, which are similar to the assumptions provided by Fan and Peng [20], are necessary for obtaining the consistency and the oracle property of PQLME.
Let , then the first derivative is as follows:where . Then, we have theorems as follows:

Theorem 2. (Consistency). Under assumptions A1–A12, we suppose that as , then there is a local maximizer of that satisfies the following:where .

According to Theorem 2, we can choose a proper to achieve consistent penalized QMLE under A1–A12.

We know that is the Hessian matrix under . Note that , then the covariance matrix of is obtained as follows:whereand is a symmetric matrix:

In addition, , represents the th row of , is the element of and is the th row of . A8 ensures that is nonsingular as goes infinite. Apparently, that provided is normally distributed, so the covariance matrix of is .

Theorem 3. Under A1–A13, we suppose that , as , then with probability tending to 1, for any that satisfy and any constants , the following equation holds:Let and be and knowing , respectively, andThen, we have the following oracle property of PQMLE.

Theorem 4. (Oracle property). Under A1–A13, we suppose that , as , then with probability tending to 1, the root--consistent local maximizer must satisfy the following:(i)(Sparsity) .(ii)(Asymptotic normality)where and is the first upper-left submatrix of and , respectively; and .

3.3. Numerical Algorithm

The analytical solution for maximizing cannot be obtained due to nonconcave penalty function and spatial term. Although the local quadratic approximation (LQA) algorithm (Fan and Li [2]) cannot be applied to SARP models directly, there are still some ideas that can be used for reference. Let which can be regarded as the approximation matrix of , is the vector form of , and . A feasible fisher’s scoring algorithm is as follows:Step 1: We initialize .Step 2: We update .Step 3: If , where is the error limit, we take as the final estimator, i.e., , otherwise iterate Step2.

The initial value in Step1 is obtained by using the ordinary least squares method based on (2), and the inverse matrix in Step2 can be approximated by the generalized inverse matrix.

4. Monte-Carlo Experiments

In this section, we conduct some Monte-Carlo experiments to examine the performance of estimation and variable selection. The simulate data are generated from model (1) as follows:

4.1. Parameters and Regressors Setting

We consider and as different sample sizes at three levels. We set the spatial autoregressive coefficient as three degrees of spatial dependence. Moreover, we set the case of which implies that the proposed model reduces to the panel model with fixed effects to examine the penalized estimator performance. In order to make diverge, we set , respectively, when , i.e., , , and , where is a zero vector. For the disturbance term, we set three variance levels and two types of disturbance distributions as follows: (i) and (ii) to explore the influence of the disturbance on the proposed estimator. We set and , where is the AR (1) matrix and is shown as follows:

In addition, we consider another setting, that is, and , and , , and for , respectively, which is just to simulate large scenarios, and it only considers the normal disturbance. Except for , , and disturbance settings, the others are same as the setting mentioned earlier.

Referring to Baltagi and Yang [21] for the generation of spatial weight matrix, the main idea is that all individuals in a “group” are regarded as “neighbors” to each other and each individual has equal influence on their “neighbor.” The steps of the procedure are as follows: (a) we set a constant and let be the number of “groups” and be the average number of individuals in each area. (b) We generate “group” size as and adjust so that it satisfies . (c) We set matrices with zero for diagonal elements and for others. (d) We set matrix as the final spatial weight matrix in model (1) which also satisfies A3–A5. In this paper, we set and generate by the method mentioned previously.

Considering that choosing different penalty function will not produce additional computation, we use the Adaptive Lasso (AdLasso) penalty function (Zou [22]) as a comparison, and the form of AdLasso penalty function is as follows:

Referring to Zhao and Xue [23], we construct a generalized mean square error (GMSE) to compare the estimation accuracy, which is defined as follows:where and is the th row of .

4.2. Monte-Carlo Results

This subsection shows the results of Monte-Carlo simulations which are reported in Table 13. Lable “C” in table means the average number of zero regression coefficients that are correctly estimated as zero, and “I” depicts the average number of nonzero regression coefficients that are erroneously set to zero. These two indicators indicate the effects of variable selection.

Table 1 presents the results of the penalized QMLE of a SARP model with fixed effects under normal disturbance when is fixed. There are some conclusions we can derive from Table 1: (a) The GMSEs reduce, and the effects of variable selection improve as increases, which proves the consistency and sparsity of estimation. (b) The performance of the penalized QMLE under disturbances of small variance is better than the large one. Small variance means less uncertainty which results in higher estimator accuracy. (c) According to the scenario of , the variable selection imposed on the spatial autoregressive coefficient is effective. (d) The GMSEs and the variable selection effect of the SCAD method are almost the same as the AdLasso method in all simulations. Table 2 presents the results of the penalized QMLE of a SARP model with fixed effects under t disturbance and shows that all conclusions derived from Table 1 are still available. Comparing Tables 1 and 2, we know that although the performance of the penalized QMLE under t disturbance is not as good as the normal one, it is still well below the misspecified distribution. Table 3 presents the results of the penalized QMLE of an SARP model with fixed effects under normal disturbance when is fixed with different . From Table 3, we can derive the conclusion that the proposed penalized QMLE also performs well under large .

5. Application

Now, we apply the proposed procedure to analyze the dataset of China’s carbon emissions. The dataset contains the carbon emission data and 10 other relevant indicators for 30 provinces and cities in China from 2008 to 2019 (except for Hong Kong, Macao, Taiwan, and Tibet, due to the difficulty of data collection) which means , and , and is partially shown in Figure 1. The raw data are obtained from the China Energy Statistics Yearbook, National Bureau of Statistics of China (https://data.stats.gov.cn/). The carbon emission data are calculated by using the formula based on energy consumption and energy carbon content which is proposed by the IPCC.

We consider a spatial autoregressive panel model for the analysis of factors affecting carbon emissions as follows:

We set up the standardized spatial weight matrix based on seven geographical divisions (East China, Northeast China, North China, Central China, South China, Northwest China, and Southwest China), and the variables represented by , (original data before logarithm transformation) are listed in Table 4. First, we need to test whether the real data conform to our proposed model. In fact, the QMLE proposed in this paper have penalized the spatial autoregressive coefficient , which sufficiently substitutes for the test of spatial autocorrelations. To test the existence of individual fixed effects, we use the chow test for the poolability of the data and the Hausman test [24]. The p value of the chow test is much less than 0.01, which means there indeed exist some individual effects in the data; the p value of the Hausman test is also much less than 0.01, which means the individual effects should be fixed rather than random. Then, we fit the dataset for (20), ordinary fixed effects panel models (i.e., ) and classical linear models (i.e., ) are fitted for comparison. The tuning parameters are determined by . All results are given in Table 5. From the analysis results, we can draw the following conclusions: (i) The SARP model with fixed effects based on the SCAD penalty function shows similar results to the model based on the ALASSO penalty function, and they all reduce to zero. (ii) The penalized estimators and of the SARP model with fixed effects are positive, which is basically reasonable; is negative, which may be because the investment in the energy industry of the state economy promotes emission reduction technology. (iii) The spatial autoregressive coefficient is not 0 under the SCAD and ALASSO penalty functions based on the SARP model with fixed effects, thus showing the existence of spatial dependence. (iv) Panel models with fixed effects under the penalty function reduce to zero, which is obviously unreasonable; the R square also indicates that the SARP models with fixed effects are better for China’s carbon emission dataset.

6. Conclusion

Within the framework of high-dimensional SARP models with fixed effects, we propose a penalized quasi-maximum likelihood approach based on matrix transformation. This approach can achieve parameter estimation and variable selection simultaneously, and we have proven that the proposed estimators are asymptotically consistent and normally distributed under some conditions. The Monte-Carlo simulations and a real data analysis of China’s carbon emissions are conducted to prove the proposed properties, and their results show the effectiveness of the proposed method.

This paper focuses only on the variable selection problem of high-dimensional SARP models with fixed effects which are still linear. There may not be similar results for nonlinear panel models and other more flexible spatial models. Furthermore, we use two penalty functions for variable selections, but the best method remains unknown. We will continue to study these issues in the future.

Appendix

Proof of theorems

In order to prove the theorems, we need the following lemmas:

Lemma A.1. Under A1–A8, we can have

Proof. can be written asApparently, . According to A2–A8, we can haveSo that all the elements of are , then Lemma A.1 holds.

Lemma A.2. Under A1–A8, we have

Proof. According to A2–A8 and Lemma A.1, we can have the following equation:According to A1, we have . Thus, Lemma A.2 holds.

Lemma A.3. Under A1–A8, we have

Proof. The elements of matrix are linear or quadratic forms of , and the matrix is as follows:According to the law of large numbers, we know that all the elements are ; thus, Lemma A.3 holds.

Proof of Theorem 2. Let and , where is a large enough constant. Similar to the proof of Theorem 1 in Fan and Peng [2], it is sufficient to prove it if we can prove that for any given , there is a large enough constant such thatThat means that with probability tending to 1, there is a local maximum in the ball such that . Note that , then we haveBy the Taylor expansion of , we havewhere lies between and . According to Lemma A.2, we haveBy Lemma A.3 and A9, we haveBy the Cauchy–Schwarz inequality, A10, and as , we haveThus, . For and , we haveThus, is negative and dominates all terms when is large enough. Then, Theorem 2 holds.

Proof of Theorem 3. Let , we just need to prove that with probability tending to 1 as for any satisfying we have, for :By Taylor expansion, we can havewhere lies between and . We consider first, and by Lemma A.1, we can haveTerm can be written asAccording to Lemma A.3, A11 and , by the Cauchy–Schwarz inequality, we haveSo, . From Lemma A.3, the Cauchy–Schwarz inequality and A11, we haveFrom the Cauchy–Schwarz inequality and A10, we can haveThen, , i.e.,By the condition of the theorem and A12, we have , so it is easy to know that the sign of is completely determined by the sign of which implies Theorem 2.

Proof of Theorem 4. Theorem 2 shows that there is a consistent local maximum if we choose a proper tune parameter . According to Theorem 3, part (i) holds, so we only need to prove part (ii). As we know, there is an estimator which satisfies the following equation:We note it as , and by the Taylor expansion, we havewhere is between and , . Mark as . According to A10 and the Cauchy–Schwarz inequality, we can haveLet and which is the first upper-left submatrix of , then from part (i), (A.24) and (A.25), we knowWe multiply both sides by and denote as From Lemma A.3, we have , thenBy the condition that and Slutsky’s theorem, as , we havewhere “” means convergence in distribution. Furthermore, the central limit theorem for linear-quadratic forms of Kelejian and Prucha [25] and assumption 9 showsThus,

Data Availability

The research data are available from the corresponding author upon request on the website https://data.stats.gov.cn/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Statistical Science Research Project (2021LY061), the Startup Foundation for Talents at Hangzhou Normal University (2019QDL039), and the Postgraduate Research Innovation Promotion Project of the Hangzhou Normal University (2022HSDYJSKY181).