Abstract

The global warming problem caused by greenhouse gas (GHG) emissions has aroused wide public concern. In order to give policy makers more power to set the specific target of GHG emission reduction, we propose an ensemble learning method with the least squares boosting (LSBoost) algorithm for the kernel-based nonlinear multivariate grey model (KGM) (1, N), and it is abbreviated as BKGM (1, N). The KGM (1, N) has the ability to handle nonlinear small-sample time series prediction. However, the prediction accuracy of KGM (1, N) is affected to an extent by selecting the proper regularization parameter and the kernel parameter. In boosting scheme, the KGM (1, N) is used as a base learner, and the use of early stopping method avoids overfitting the training dataset. The empirical analysis of forecasting GHG emissions in 27 European countries for the period 2015–2019 is carried out. Overall error analysis indicators demonstrate that the BKGM (1, N) provides remarkable prediction performance compared with original KGM (1, N), support vector regression (SVR), and robust linear regression (RLR) in estimating GHG emissions.

1. Introduction

The excessive GHG emissions led to the global warming and the frequent occurrence of extreme climate disasters. In order to combat global climate change actively, every country should define the goals of carbon peak and carbon neutralization, and specify plans for this purpose. However, the premise that all countries realize the commitment of reducing carbon emission intensity and the goals of carbon peak and carbon neutralization is the accurate prediction of GHG emission trend. Based on statistical theory, machine learning, and deep learning, a lot of research studies have been carried out, such as the autoregressive integrated moving average model [1], gaussian process regression model [2], neural network [3, 4], support vector machine [5, 6], and long short-term memory network [7, 8], but these models need the training data with large sample size. If the sample size is insufficient or the sample contains insufficient information, it will greatly affect the performance of these models.

For uncertain systems with small samples or poor information, Professor Deng Julong, a Chinese scholar, put forward the grey system theory, which mines the “partially known and partially unknown” information, so as to correctly describe the characteristics and evolution of uncertain systems [9]. As an important part of grey system theory, the grey forecasting theory is one of the most active research directions in small-sample time series prediction. Aiming at the problem of time series prediction with poor information and small samples, the grey forecasting theory mainly generates the sequences by accumulation, establishes the grey model, obtains the valuable system information, and then completes the quantitative prediction of system change. Compared with the prediction methods based on the statistical model, machine learning, and deep learning, with the characteristics of simpler structure and fewer modelling samples, the grey forecasting models have exhibited a wider range of applications in GHG emission forecasting [1019].

At present, the mainstream grey forecasting models include univariate grey models and multivariable grey forecasting models [20]. The modelling process of multivariable type not only considers the development trend of system characteristic sequence itself but also mines the information about the driving effect contained in the related factor sequences, that is more effective than the univariate type. Since the multivariable grey forecasting model has been put forward, a large number of scholars carry out research and improvement from the aspects of cumulative transformation, parameter estimation, background value, the structure of grey differential equation, and the expression for response function so as to improve the prediction accuracy of the model. For examples, Li et al. [21] transformed the original sequence by the Hausdorff derivative operator and utilized the trapezoidal integral formula to calculate the predicted value. Wang et al. [22] introduced the adjacent accumulation method into the grey multivariable convolution model. Wang and Cao [23] optimized the background of the traditional model according to the Simpson formula and the Boolean formula. To overcome overfitting problem caused by small-sample data modelling, through regularization, Xie et al. [24] proposed one approach which started from the unweighted multivariate grey model and then made the solution robust by updating weighting factors iteratively based on the previous error distribution.

Among many multivariate grey forecasting models, it is worth noting that the KGM (1, N) is established under the structural risk minimization principle [25]. In addition, the KGM (1, N) includes a nonlinear function estimated by the kernel function, and it can approximate to an arbitrary continuous function in theory [26]. With respect to regularization parameter and kernel parameter having impact on the stability of KGM (1, N) [27], this means that small changes in parameters can result in different prediction results. Therefore, KGM (1, N) is suited to be used in boosting. The boosting algorithm is to create a sequence of base learners, in which each subsequent base learner focuses on attacking the error of the previous base learner. Boosting was firstly developed for classification problems [28, 29]. Driven by the success in classification, a general boosting algorithm GradientBoost designed for regression was developed [30].

On these theoretical bases, for the kernel-based nonlinear multivariate grey model, we propose an ensemble learning method with LSBoost algorithm [31], an instantiation of GradientBoost. The main contributions of our study are summarized as follows:(1)We regard the multivariable grey forecasting model as one that can solve the multiple regression problem with the characteristics of time series, that provides a research way to improve the prediction accuracy of multivariable grey forecasting models through ensemble learning(2)LSBoost algorithm is used to combine the base learners, i.e., kernel-based nonlinear multivariate grey models to form the composite predictor. In order to avoid the overfitting problem, we adopt the early stopping method to improve the generalization ability.(3)The experimental results of forecasting GHG emissions in 27 European countries over a five-year horizon confirm that the prediction accuracy of BKGM (1, N) is higher than that of models considered for comparison.

The rest of this paper is organized as follows. Section 2 introduces KGM (1, N), LSBoost, and further proposes BKGM (1, N). Section 3 demonstrates realistic forecasting results for the GHG emissions data in 27 European countries from 2015 to 2019, compares the forecasting accuracies of BKGM (1, N) with other models, and further discusses the effect of learning rate on the computational cost and the generalization ability. Section 4 draws a conclusion for the paper.

2. Method

2.1. Kernel-Based Nonlinear Multivariate Grey Model

The modelling procedures of the KGM (1, N) are given as follows [25]:

is the system characteristic sequence, and are the relevant factor sequences. The first order accumulative generation value is

The background value is

Then, the KGM (1, N) is defined aswhere is a bias, and is represented aswhere the sequences can be mapped into a higher dimensional feature space by , and is a weight vector.

To estimate the parameters, the following convex optimization problem is constructed:where the regularization parameter C is a positive number. To solve this convex optimization problem, we utilize the Lagrange multiplier method and the Gaussian kernel function, which is expressed aswhere both and are the column vectors, and is the kernel parameter.

2.2. Least Squares Boosting Algorithm

The LSBoost algorithm adjusts weight values through repeated training with the aim to minimize the error between target variable and the aggregated prediction of the base learners, and then obtains the final result based on iterated weight. The detailed procedure of LSBoost Algorithm 1 is described as follows:Input: initial value , the maximum iteration number M, training dataset size N, learning rate .Output: the final predicted value .For do: for    endfor    endForendAlgorithm

Three remarks must be made about this algorithm.

The first remark is that the controlling the learning rate at which boosting learns is a positive number.

The second remark is that the function is called a base learner. Generally, the base learners are regression models, and the parameter of function can be determined before running the LSBoost algorithm.

The third remark is that the LSBoost algorithm involves minimizing the squared 2 norm of the vector . In other words,

The equation above is equivalent to minimize a quadratic function. The solution is calculated by the interior-point optimization algorithm [32, 33].

2.3. Boosting Kernel-Based Nonlinear Multivariate Grey Model

KGM (1, N) has the ability to handle small-sample time series prediction. However, the prediction accuracy of KGM (1, N) is affected to an extent by selecting the proper regularization parameter and the kernel parameter. Based on the analysis above, to improve the prediction performance, an ensemble learning method with the LSBoost algorithm is proposed, and the KGM (1, N) is used as a base learner.

When we train models, we usually want to obtain the best generalization performance. However, a lot of models are easy to overfit the data: when the model performs better in the training dataset, in fact, at a certain moment, its performance in the testing dataset has begun to deteriorate. Therefore, during boosting, the early stopping method is adopted to avoid overfitting. When the observed data of system characteristic sequence and related factor sequences have been collected, they need to be divided into subsets involving the training dataset, the validation dataset, and the testing dataset. For training BKGM (1, N), the data for developing model are the training dataset and the validation dataset. The former is utilized for updating the weight values of the LSBoost algorithm at every iteration, and the latter is not used for updating the weight values but for reducing the potential for overfitting; that is, in the initial stage of training, the errors of the training dataset and the validation dataset usually decrease with the increase of iteration steps. However, when the BKGM (1, N) begins to overfit the validation dataset, the error on the validation dataset normally begins to rise. The third subset is the testing dataset, which only involves related factor sequence data. The predicted values are saved at every step during the iterative process. When the error on the validation dataset begins to rise or iteration number reaches maximum, the training is stopped, and the final predicted values are obtained. In detail, the boosting scheme for KGM (1, N) is shown in Figure 1.

3. Application in Forecasting GHG Emissions

3.1. Model Evaluation

The prediction error is an important index to evaluate the reliability of a model. In this paper, the Nash–Sutcliffe efficiency coefficient (NSE), mean absolute percent error (MAPE), and mean squared error (MSE) are used to evaluate the precision:where n is the sample size. and are the ith observed and predicted values, respectively.

3.2. Numerical Simulation and Predicted Results

We obtained the data of GHG emissions (unit: index 1990 = 100), primary energy consumption (unit: million tonnes of oil equivalent), and energy taxes (unit: million euro) from the Eurostat Air Pollution Database [34]. After removing the missing values, there are GHG emission data for 27 European countries, ranging from 2000 to 2019. We intend to explore and assess the predictability of the BKGM (1, N) for promoting the reliability and accuracy of regional GHG emission prediction. Based on the historical data of GHG emissions which covers 27 European countries for the period 2000–2019, the proposed model is implemented. For each country, the data of GHG emissions, primary energy consumption, and energy taxes are small-sample time series, and they are divided into training dataset (for the period 2000–2009), validation dataset (for the period 2010–2014), and testing dataset (for the period 2015–2019). The GHG emission time series is the system characteristic sequence, and the time series of primary energy consumption and energy taxes are related factor sequences. Firstly, we develop the KGM (1, N) and select the value of regularization parameter and kernel parameter with the error on the validation dataset that reaches a minimum; that is, the base learner KGM (1, N) is ready to boost. Secondly, larger values of learning rate will result in the algorithm overfitting the training dataset; therefore, we set the learning rate and the corresponding maximum iteration number as 0.001 and 50000, respectively. For each country, the regularization parameter value, the kernel parameter value, and the iteration number to achieve the minimum validation dataset error are displayed in Table 1.

We also develop SVR [35] and RLR [36] to forecast GHG emission for 27 European countries. For evaluating the predicted performance of the model, there are 5 measurements consisting of GHG emission measurements ranging from 2015 to 2019 for each of the 27 European countries, so the total measurements are 135. The comparison is presented in Figure 2. It can be observed that, the worst predicted performance is achieved by SVR and RLR. Especially, when forecasting GHG emissions in Finland from 2015 to 2019, the predicted values of SVR and RLR are less than zero. The forecasting results of the proposed BKGM (1, N) have higher similarity to the observed GHG emission data in 27 European countries than those of KGM (1, N), SVR, and RLR. The evaluation criteria of NSE, MSE, and MAPE give the accurate measure of these models. For SVR, the NSE value is −0.5186, the MSE value is 919.9, and the MAPE value is 21.64%. For RLR, the NSE value is −0.6510, the MSE value is 1000, and the MAPE value is 21.24%. For KGM (1, N), the NSE value is 0.3190, the MSE value is 412.6, and the MAPE value is 19.42%. A model with more predictive skill is that the NSE value is nearer to 1, and the MSE value or MAPE value is nearer to 0. The BKGM (1, N) has significantly lower MSE value 211.8, MAPE value 11.47%, and higher NSE value 0.6504 as compared to KGM (1, N), SVR, and RLR. Based on the comprehensive comparison, the BKGM (1, N) gets the best performance in forecasting GHG emissions in 27 European countries for the period 2015 to 2019.

3.3. Effect of Learning Rate on the Computational Cost and the Generalization Ability

In this subsection, we evaluate how the learning rate affects to the computational cost and the generalization ability of proposed model. We measure the necessary running time to accomplish the iterative process as the computational cost of it. They are displayed in Table 2, that compares the evolution of the processing time (unit: second) according to the learning rate value. The computer system configuration during the experiment is that Windows 10 operating systems, Intel Core i7-7700HQ processor, 16 GB RAM. It is good practice to tune parameters such as the learning rate. With an increasing learning rate, the elapsed time drops rapidly at the beginning and decreases gradually; specifically, the elapsed time decreases proportionally with an increasing learning rate. For example, the experiment of forecasting GHG emissions for Germany shows that the learning rate value is set as 0.001, and the computational cost is 116.98 s. When the learning rate value is set as 0.005, the computational cost is 23.85 s, almost one fifth of 116.98 s. Furthermore, when the learning rate value is set as 0.001, the longest processing time is 135.77 s, in forecasting GHG emissions for Norway. The processing time in forecasting GHG emissions for 26 other countries is about 10 seconds. Based on the comprehensive analysis, the computational cost of the proposed model is acceptable. Therefore, the approach is computationally efficient to be used in the prediction of GHG emissions.

Based on the experiment of forecasting GHG emissions for Poland from 2015 to 2019, Figure 3 displays the cumulative loss measured by MAPE on the training dataset, validation dataset, and testing dataset with 4 different learning rates using each value in the set {0.1, 0.3, 0.5, 0.7}, and monitors how the loss changes as base learners accumulate in the ensemble. It is very interesting to note that the pattern of validation dataset error and testing dataset error is very similar. If not, this might indicate a poor division between the training dataset and the validation dataset. It can be observed that as the learning rate value increases, the probability of the iteration number that corresponds to the minimum MAPE on the testing dataset being equal to that of validation dataset increases, and the iteration number that corresponds to the minimum MAPE on validation dataset decreases. This means that it is easier to cause overfitting due to larger value of learning rate, effecting the generalization ability of BKGM (1, N). Therefore, in order to achieve smaller generalization error, usually, the learning rate value is set small and the maximum iteration number is set large enough.

4. Conclusion

There is no worst model, only the worst data. For regional GHG emission prediction, the GHG emissions, primary energy consumption, and energy taxes for the period 2000–2019 are small-sample time series, that are suitable for grey forecasting models to deal with efficiently. With respect to the influence of regularization parameter and kernel parameter on accuracy of KGM (1, N), an LSBoost algorithm was used to combine the base learners to form the composite predictor. The early stopping method avoids overfitting the training dataset; that is, monitoring validation error during the training process guarantees the capability of improving accuracy for GHG emission prediction. Compared with SVR and RLR, the prediction performance of BKGM (1, N) for GHG emissions was evaluated using the measurements in 27 European countries for the period 2015–2019. The BKGM (1, N) can be a reliable approach for forecasting GHG emissions at the national level. In addition, the ensemble learning method with the LSBoost algorithm can be used as a strategy to improve other grey forecasting models; this will guide us to further research.

Data Availability

The raw data of GHG emissions, primary energy consumption, and energy taxes from 2000 to 2019 were collected from the Eurostat Air Pollution Database, from thewebsite http://ec.europa.eu/eurostat/data/database.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the Funding Project of High-Level Talents in Hebei Province (No. A202001113), the Social Science Foundation of Hebei Province (No. HB20GL034), the Project of Social Science and Development in Hebei Province (No.20210301056), the Science and Technology Research Program of Chongqing Municipal Educational Commission (No. KJQN202000518), and the Project of Top Young Talents in Handan City (No. 60000005).