Abstract

In terms of the problems of high feature dimension and large data redundancy in the wind and solar power prediction method, an improved prediction model is proposed by combining feature selection methods with the long- and short-term time-series network (LSTNet). The long short-term memory (LSTM) unit in the LSTNet model is replaced with the bidirectional long short-term memory (BiLSTM), which enables recursive response training for the states of hidden layers at the start and end of the sequence. For feature selection, both feature screening and dimension reduction methods are considered, including random forest (RF), grey relational analysis (GRA), and principal component analysis (PCA). Finally, based on wind and solar power data, the effectiveness of the proposed methods is verified, where the RF-LSTNet performs the best. For wind power prediction, the mean absolute percentage error is reduced by 29.7% and root mean square error is reduced by 24.1% compared with the traditional LSTNet model, and for solar power prediction, the MAPE is reduced by 12.9% and RMSE is reduced by 3.8%.

1. Introduction

The high proportion of new energy access is a prominent feature of the new power system. Accurate and reliable wind and solar power forecasting can provide fundamental information support for the safe and economic operation of the power system [1, 2]. Wind and solar power forecasting can be divided into long-term, medium-term, short-term, and ultrashort-term forecasting concerning different sampling intervals. Short-term forecasting is for the next few days, which can provide technical suggestions for new energy consumption and real-time dispatch [3].

It is common to use historical data to predict future wind and solar power, where feature selection has a significant impact on the results. Feature selection can be divided into two categories, namely, feature screening and dimension reduction. The former extracts strongly correlated features, while the latter eliminates collinear features. In [4], principal component analysis (PCA) was used to handle high-dimensional data in wind power prediction. In [5], grey relational analysis (GRA) was used to obtain the power of similar weather times utilized as a label along with weather information, which improved the accuracy of wind power forecasting of conditional generative adversarial network (CGAN) models. In [6], the random forest (RF) was applied to screen out the feature data with the greatest impact on the output power from all the feature data in the training set. Although feature selection has been used to improve the predicted performances for wind and solar power forecasting, most existing studies focused on a single feature selection method such that a comprehensive analysis of feature selection methods on wind and solar power forecasting is not sufficient.

Due to the intermittent and irregular characteristics of wind and solar power, most of the existing power prediction methods are based on time-series data with nonlinear network prediction models [79]. The long- and short-term time-series network (LSTNet) uses the convolutional neural network (CNN) and the recurrent neural network (RNN) to extract short-term local dependency patterns among variables and discover long-term patterns for time-series trends. Furthermore, LSTNet adopts the traditional autoregressive (AR) model to tackle the scale-insensitive problem of the neural network model. LSTNet has been applied to wind power prediction [10], household load prediction [11], and temperature prediction [12]. In [10], the performance of models such as LSTNet, temporal pattern attention-based long short-term memory (TPA-LSTM), and dual-stage attention-based recurrent neural network (DA-RNN) was compared and analyzed for the wind power prediction problem. However, the problem of wind and solar power forecasting based on LSTNet has not been sufficiently studied. In addition, feature selection methods have been combined with various types of prediction models so as to reduce the dimension of features. Currently, the problem of wind and solar power forecasting based on the combination of feature selection and LSTNet has also not been sufficiently studied.

In this paper, the problems of high feature dimensions and large data redundancy of wind and solar power data are studied from two aspects, namely, feature selection and model structure improvement. For feature selection, three types of typical algorithms (PCA, GRA, and RF) from feature screening and dimension reduction methods are investigated. For model structure improvement, the long short-term memory (LSTM) unit in the LSTNet model is replaced with the bidirectional long short-term memory (BiLSTM), which enables recursive response training for the states of hidden layers at the start and end of the sequence. A wind and solar power forecasting method based on a combination of feature selection and an improved LSTNet is established in this paper. Finally, the effectiveness of the proposed combined feature selection and improved LSTNet model is verified with wind and solar power data.

2. Improved LSTNet Model

The LSTNet model consists of a convolutional component, a recurrent component, a recurrent-skip component, an autoregressive component, and a special cyclic skip module, as shown in Figure 1 [13]. The model can effectively extract the feature information of time series in the long term and combine linear and nonlinear models to make the model perform better and more robust. The convolutional layer consists of multiple filters. The th filter sweeping through the input matrix is shown as follows:where denotes the convolutional operation, is the output feature vector, the rectified linear unit (ReLU) function is ReLU() = max(0, ), is the weight matrix connected to the convolutional kernel of the th feature, and is the bias vector of the feature.

The output of the convolutional layer is simultaneously fed into the recurrent component and recurrent-skip component [14]. The recurrent component is a recurrent layer with the LSTM and uses the RELU function as the hidden update activation function. Specifically, skip links are added between the currently hidden cell and the hidden cells in the same phase in adjacent periods. The hidden state of recurrent units at time t is computed as [11]where the input of this layer is the output of the convolutional layer, is the number of hidden cells skipped through,  denotes the multiplication of the corresponding positions of the elements, is the sigmoid function, denotes the tanh function, , , , , , and denote the input gate, forget gate, input node, output gate, memory unit, and hidden layer output, respectively, , , , , , , , and are the matrix weights of the correlation gate multiplied by the input and the output , and , , , and are the bias vectors of the correlation gate.

This paper creatively used BiLSTM as the basic unit of the recurrent layer. The choice of the BiLSTM is due to its ability to perform forward and reverse bidirectional propagation based on LSTM. The equation is as follows:where LSTM denotes the model when in Equations (2)–(7).

The difference between BiLSTM and LSTM is that the BiLSTM network has bidirectional characteristics by constructing a pair of LSTM layers in opposite directions. The bidirectional network enables recursive response training for the state of the hidden layer at the start and end of the sequence, and this could further mine the data in the future. In addition, BiLSTM can optimize the problem of long-term data dependency and improve the prediction accuracy of the model.

LSTNet uses a fully connected layer to combine the outputs of the recurrent and recurrent-skip components [13]. The equation for calculating the output of the fully connected layer iswhere the input of the fully connected layer includes the hidden state of the recurrent component at time , denoted by , and hidden states of the recurrent-skip component from time to denoted by , ··· , is the prediction result of LSTM at the time, and are the parameters of the model, and is the bias vector of the model.

In the LSTNet, an autoregressive (AR) model is adopted as the linear component [15]. The AR model can be represented as follows:where is the prediction result of the AR model, and are the parameters of the model, and is the size of the input matrix.

Finally, the output of the fully connected layer and the output of the autoregressive layer are integrated to obtain the final prediction result , which is as follows:

The evaluation metrics of the model include the root mean square error (RMSE) and the mean absolute percentage error (MAPE) [16].(1)RMSE: the root mean square error is expressed as(2)MAPE: the mean absolute percentage error is expressed aswhere is the number of samples, is the true value, and is the predicted value.

3. Feature Selection Methods

Feature selection extracts effective features from original features, which can reduce the difficulty of model training. In this paper, three typical methods including random forest (RF), grey relational analysis (GRA), and principal component analysis (PCA) are used to analyze the influence of features on wind and solar power data from the perspective of dimension reduction and feature screening. Finally, the features of the prediction model are determined.

3.1. Feature Selection Methods Based on Screening

RF and GRA are two typical feature selection methods based on screening. RF, an ensemble-learning algorithm, is used for feature selection by calculating the out-of-bag data of the sampling process [17]. The steps to calculate the importance of a feature are as follows.

We select samples from the sample set as a training set by the method of put-back sampling and generate a decision tree with the training set. For decision trees, the number of classification errors in the out-of-bag data needs to be calculated. Then disturb the value of randomly in the out-of-bag data and recalculate the number of classification errors . Then, the importance of feature is calculated using the following equation:

If the out-of-bag error increases significantly after adding disturbance, it means that this feature is important. The number of times to perform the above steps determines the number of decision trees.

GRA obtains the relationship of features by calculating the grey relation matrix. The equation for calculating the correlation coefficient is as follows:where denotes the subtraction of data. The correlation coefficient is between the th reference sequence and the th sample of the th comparison sequence. The value of is between [0, 1]. The final correlation coefficient is as follows:where is more than 0.7, indicating a strong correlation, and if it is less than 0.3, it indicates a weak correlation [18, 19].

3.2. Feature Selection Methods  Based on Dimension Reduction

PCA is a classic feature selection method based on dimension reduction, which transforms the original data into a new feature space through linear transformation and focuses on extracting the most important linear components from the data [20]. The algorithm steps are as follows:(1)We calculate the standardization matrix of the original data(2)We calculate the Pearson correlation coefficient matrix of the standardization matrix as followswhere is the correlation coefficient between two features, and are the values of the two features, and are the mean of the two features, and is the number of data.(3)We calculate the eigenvalue () matrix and the eigenvector(4)We calculate the contribution rate and the cumulative contribution rate according to Equations (18) and (19) which are, respectively, represented as(5)We obtain the dimension reduction data from the selected principal components and the corresponding eigenvectors

4. Short-Term Wind Power Forecasting

In order to test the proposed wind power forecasting model, the data from Spanish onshore wind power were used in the case study. The wind power data from July 9, 2016, to July 11, 2018, and a sampling interval of 1 hour were used. Eighty percent of the data were taken as the training data for modeling, while twenty percent of the data were taken as the testing data for evaluation and validation.

4.1. Feature Selection Results

The original feature numbers of wind power data used in this paper are shown in Table 1.

The feature selection results of RF, PCA, and GRA feature selection are shown in Figure 2, where the lower limit of RF important features is 0.1, the cumulative contribution of PCA is 0.8, and the lower limit of GRA important features is 0.7.

From Figure 2, one can see that all the features excluding feature 5 (humidity) are important in the RF feature selection results. The temperature, pressure, wind speed, and wind direction angle all have a greater impact on wind power than humidity, and the wind direction angle is the most influential feature. From Figure 2, for the PCA feature selection results, the cumulative contribution rate of the first three principal components has exceeded 80%, and hence, the first three principal components are selected as the input of the model. In Figure 2, for the GRA feature selection results, by calculating the grey correlation degree, it is found that except for feature 5 and feature 6, the rest of the features are highly correlated with wind power.

4.2. Short-Term Wind Power Forecasting with LSTNet

In this subsection, the feature selection methods including RF, PCA, and GRA are combined with the LSTNet model for wind power forecasting. The prediction results are shown in Figure 3, where one can see that the RF-LSTNet performs the best. In order to verify the accuracy of the RF-LSTNet model, convolutional neural network and long short-term memory (CNN-LSTM), convolutional neural network and bidirectional long short-term memory (CNN-BiLSTM), CNN-LSTM-attention, and CNN-BiLSTM-attention were selected for comparison, where the number of the filters of the CNN layer is 64, the convolution of the kernel size is 6, the hidden layer unit of LSTM is 64, and the activation function is sigmoid [21, 22]. The results are shown in Figure 4.

It can be seen from Figure 3 and Table 2 that the LSTNet model combined with feature selection has better prediction results than the traditional LSTNet model, which proves the fact that optimal features are selected by the adopted feature selection methods and contribute to the improvement of prediction power and accuracy, where the RF method performs the best. Compared with the traditional LSTNet model, the MAPE of RF-LSTNet was reduced by 29.7%, and the RMSE was reduced by 24.1%. The rationale is that the RF method has a strong learning ability, which obtains better effective features, thereby improving the prediction accuracy. From the results in Figure 4 and Table 2, it can be observed that in wind power prediction, the RF-LSTNet model prediction of power is closer to the real data in the peak interval than other models. The main reason is that the LSTNet model can extract short-term local dependency patterns among variables and discover long-term patterns for time-series trends.

5. Short-Term Solar Power Forecasting

To ascertain the performances of the proposed LSTNet-based prediction models, numerical experiments for solar power forecasting were conducted based on the open-source dataset, where the sampling interval was 15 minutes. Eighty percent of the data were taken as the training data for modeling, while twenty percent of the data were taken as the testing data for evaluation and validation.

5.1. Feature Selection Results

The original feature numbers of the solar data used are shown in Table 3.

The feature selection results of RF, PCA, and GRA are shown in Figure 5, where the lower limit of RF important features is 0.1, the cumulative contribution of PCA is 0.8, and the lower limit of GRA important features is 0.7.

From Figure 5, for the RF feature selection results, one can see that feature 5 has a weak correlation with solar power and temperature has the highest correlation, indicating that feature 1 has the greatest impact on solar power. From Figure 5, for the PCA feature selection results, the cumulative contribution rate of the first four principal components has exceeded 80%, and hence, the first four principal components are selected as the input of the model. In Figure 5, for the GRA feature selection results, by calculating the grey correlation degree, it is found that except for feature 2, feature 3, and feature 6, the rest of the features are highly correlated with wind power.

5.2. Short-Term Solar Power Forecasting with LSTNet

In order to verify the accuracy of the RF-LSTNet model, convolutional neural network and long short-term memory (CNN-LSTM), convolutional neural network and bidirectional long short-term memory (CNN-BiLSTM), CNN-LSTM-attention, and CNN-BiLSTM-attention were selected for comparison, where the number of the filters of the CNN layer is 64, the convolution of the kernel size is 6, the hidden layer unit of LSTM is 64, and the activation function is sigmoid. The experimental results are as follows.

It can be seen from Figure 6 that the LSTNet model combined with feature selection has better prediction results than the traditional LSTNet model, which proves the fact that optimal features are selected by the adopted feature selection methods and contribute to the improvement of prediction accuracy, where the RF method performs the best. Compared with the traditional LSTNet model, the MAPE of RF-LSTNet was reduced by 12.9% and the RMSE was reduced by 3.8%. The reason is that the RF has a strong learning ability, which obtains better effective features, thereby improving the prediction accuracy. From the results in Figure 7 and Table 4, it can be seen that in solar power prediction, the RF-LSTNet model prediction of power is closer to the real data in the peak interval than other models. The main reason is that the LSTNet model can extract short-term local dependency patterns among variables and discover long-term patterns for time-series trends.

6. Conclusions

In this paper, a short-term wind and solar power forecasting method based on feature selection and improved LSTNet is proposed. In sum, the study concludes as follows:(i)In terms of the high feature dimension and the large redundancy of wind and solar data, three typical feature selection methods, namely, PCA, GRA, and RF, from the perspectives of feature dimension reduction and feature screening were adopted for wind and solar power forecasting(ii)The LSTNet model can perceive the long-term trend and short-term changes in wind and solar power data, which can better fit the changing trending pattern of wind and solar power data. It is important to note that in the short-term peak intervals, the LSTNet model is more effective and accurate than typical prediction models. Furthermore, the BiLSTM employed in this study can further improve the prediction power and accuracy due to the fact that features of wind and solar data are effectively extracted.(iii)A combination of feature selection and an improved LSTNet model was proposed. Numerical simulations indicated that the RF is more suitable for feature selection of wind and solar power data.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Science and Technology Project of State Grid Hebei Electric Power Co., Ltd. (5204DY200002).