Abstract

Accurate power load forecasting is essential for power grid operation and dispatching. To further improve the accuracy of power load forecasting, this study proposes a new power load forecasting method. Firstly, correlation coefficients of influential variables are calculated for feature selection. Secondly, the form of input data is changed to adjust for autocorrelated errors. Thirdly, data features are extracted by convolutional neural networks (CNN) to construct feature vectors. Finally, the feature vectors are input into long short-term memory (LSTM) network for training to obtain prediction results. Moreover, for solving the problem that network hyperparameters are difficult to set, the simulated annealing particle swarm optimization (SAPSO) algorithm is used to optimize the hyperparameters. Experiments show that the prediction accuracy of the proposed model is higher compared with LSTM, CNN-LSTM, and other models.

1. Introduction

Accurate power load forecasting helps ensure the stable operation of the power system and reduce the cost and emissions on the power generation side [1, 2]. In the context of “dual carbon,” refined power load forecasting is particularly important [3]. Because short-term power load forecasting can be used for intraday real-time power generation planning in power plants, reasonable power load planning can effectively ensure the stable operation of the power system, reduce the cost and carbon emissions of power generation equipment, and improve the efficiency of power grid dispatching. The power system load is often affected by multiple factors and has strong randomness and nonlinear characteristics [4]. Therefore, it is necessary to deeply explore the change law of power load to improve prediction accuracy and provide a reliable basis for the operation and dispatch of regional power systems, energy conservation, emission reduction, and economic development [5].

At present, the methods of power load forecasting are mainly divided into two types: traditional mathematical and statistical methods and machine learning methods. Traditional statistical methods include the exponential smoothing method [6, 7], the multiple linear regression method [8, 9], the Kalman filter method [10, 11], and so on. Statistical methods based on traditional mathematical models have high computational efficiency and clear models but have higher requirements for the stationarity of time series data. There are certain limitations in processing power load data with strong randomness and volatility. The machine learning method has a good ability to deal with nonlinear and complicated problems [12] and is widely used in the field of load forecasting. Machine learning methods mainly include neural network [1315], support vector regression machine [16], random forest [17, 18], and so on. Support vector regression machine applies support vector to regression tasks and has strong learning ability on small sample data. Random forest is an ensemble learning algorithm. Its basic composition is multiple decision trees, and the results of multiple decision trees are averaged to obtain the final result. But both support vector machines and random forests are more suitable for classification tasks. However, the neural network has a strong nonlinear fitting ability, which is very suitable for the field of load forecasting.

Among many neural network models, the long short-term memory (LSTM) network, which has a strong ability to process time-series data, and the convolutional neural network (CNN), which can effectively extract data features, have a significant impact on power load. It has been widely used in prediction [1925]. LSTM network is a variant of a recurrent neural network (RNN). Compared with other neural networks, the LSTM network is more able to mine the time series of data [26]. The CNN can effectively extract features in time series data through convolution operations and capture the time series and seasonality in the data.

However, there are often two problems when using neural networks for time series prediction: (1) hyperparameters such as the number of neurons, the learning rate, and the number of iterations are difficult to determine and (2) errors autocorrelation. The setting of hyperparameters directly affects the prediction accuracy and training time of the neural network, and the prediction effects of network models trained with different hyperparameters vary greatly. If these hyperparameters are manually set by experience, the highest prediction accuracy cannot be achieved. Therefore, the hyperparameters in the network are used as variables in this paper, and simulated annealing particle swarm optimization (SAPSO) is used for optimization. Errors autocorrelation refers to the existence of autocorrelation between errors at different time steps during time series prediction. This violates the assumption in the maximum likelihood estimation, resulting in the network being unable to train optimal parameters and reducing the prediction accuracy. There are many possible reasons for errors autocorrelation, such as missing influencing variables, measurement errors, and misuse of models. [27]. In the field of econometrics, adjusting for autocorrelated errors in linear or nonlinear time series data has been widely studied [2834]. Generalized least squares (GLS) is the most basic method [28]. It estimates the residual autocorrelation in the errors at first, then transforms the series to weaken the autocorrelation, and fits ordinary least squares (OLS) to the transformed series. To obtain a more accurate autocorrelation coefficient, D. Cochrane and G. H. Orcutt proposed the Cochrane–Orcutt iterative method [30]. It can obtain the accurate autocorrelation coefficient by iterating the above process until convergence without knowing the specific form of autocorrelated errors. And the Prais–Winsten method [34] solves the problem that the first sample is discarded in the process of time series transformation by retaining the first sample with appropriate transformation. But, in the field of machine learning, there is not much research on autocorrelated errors. Following the work of the aforementioned study, Sun et al. [27] proposed an effective and simple method to adjust for the autocorrelated errors in neural networks, which was verified by experiments to be effective.

To sum up, to fully mine the temporal characteristics of multi-dimensional power load data and improve the accuracy of power load forecasting, this paper proposes a SAPSO-CNN-LSTM model considering autocorrelated errors. Firstly, the characteristic variables with high correlation are selected to input the model. Then adjust for autocorrelated errors. And use CNN to extract features in time series data. Finally, use the LSTM network for training. This study selects a power load data set from a certain area of Australia in 2006 for the experiment and compares them with the prediction results of LSTM, CNN-LSTM, PSO-CNN-LSTM, and SAPSO-CNN-LSTM. The experimental results show that the proposed model has higher prediction accuracy.

This paper is divided into three parts. The first part is about the basic principle of algorithm and model; it mainly clarifies the principle of the LSTM network, CNN and SAPSO algorithm, and the method of feature selection and autocorrelated errors adjustment. The second part is about the details of the proposed model, including the structure of the neural network, parameters setting, and algorithm flow. The third part is about the experimental results and analysis.

2. Basic Principle of the Algorithm Model

2.1. Basic Neural Network Model

CNN is widely used in the field of deep learning and is one of the classic algorithms. As shown in Figure 1, its structure generally consists of three parts: convolution layer, pooling layer, and fully connected layer. There are multiple convolution kernels of different sizes in the convolution layer to perform convolution calculations on the data to extract feature information from the input data. The pooling layer downsamples the feature map obtained after convolution to achieve dimensionality reduction of data. The fully connected layer is generally one or more layers of backpropagation (BP) neural network, which integrates the features extracted by the upper layer and outputs the data to the output layer. In addition, the convolutional neural network has the characteristics of local connection and weight sharing. This results in less computation and more efficient extraction of data features.

In order to solve the problem that cyclic neural network RNN is prone to gradient disappearance or gradient explosion, LSTM adds a cell and three gated units to the structure of RNN: forget gate, input gate, and output gate, as shown in Figure 2. The cell has the cell state and is used to preserve long-term memory. The forget gate controls the cell to forget history hidden cell state within a certain probability; the input gate is responsible for processing the input of the neuron at the current moment; and the output gate controls the output of the neuron at the current moment. Specifically, the input at the current moment and the hidden cell state at the last moment enter the cell via the forget gate and the input gate, which together constitute the cell state. Then the cell transmits the cell state along the time axis in a channel with little attenuation of information, so the cell is good at retaining the information with a long time span, that is, to preserve long-term memory. And the output gate controls the neuron to selectively output the contents of the cell at the current moment. The calculation formula of the LSTM network is shown in the following equations:where W is the weight, b is the bias value, is the sigmoid function, is the cell state at the last moment, is the hidden cell state at the last moment, is the input at the current moment, is the forget gate output, is the output of input gate, is the output of output gate, is the cell state at the current moment, and is the hidden cell state at the current moment.

2.2. SAPSO Algorithm
2.2.1. SAPSO Algorithm Principle

Particle swarm optimization is an intelligent optimization algorithm proposed by Kennedy and Eberhart [35]. Its idea originates from the research and simulation of the behavior of flock of birds. Each particle in the particle swarm optimization algorithm represents a solution to the optimization problem, and the iterative formulas for the velocity and position of each particle are shown in the following equations:where is the individual learning factor of particles, is the social learning factor of particles, is the inertia weight, is the best position that the i-th particle has passed until the d-th iteration, and is the d iteration. The best position that all particles have passed through so far, , is a random number on [0, 1].

In order to solve the problem that the traditional particle swarm optimization algorithm is easy to fall into local optimum, this paper adopts the SAPSO algorithm. The simulated annealing algorithm is derived from the principle of solid annealing, and its basic idea is to accept inferior solutions with a certain probability. The idea of the simulated annealing algorithm is introduced into the particle swarm algorithm so that personal best has a certain probability to replace the global best . According to the idea of a simulated annealing algorithm, with smaller fitness is a better solution and should have a larger jump probability. At the same time, the transition probability will decrease as the temperature decreases, so assuming that there are m particles in total, the transition probability of each is shown in the following equationswhere f is the fitness function, T is the current temperature, T0 is the initial temperature, and α is the annealing coefficient.

Denote the replaced global optimum as ; then the particle velocity iteration formula of SAPSO is

2.2.2. Adaptive Inertia Weight

In the particle swarm optimization algorithm, the inertia weight is an important parameter. It determines the search ability of the algorithm. A larger inertia weight will make the algorithm have a stronger global search ability, while a smaller inertia weight will make the algorithm have a stronger local search ability. In the basic particle swarm algorithm, the inertia weight is constant and does not change as the number of iterations increases. In order to improve the convergence speed and accuracy of the algorithm, in this study, the particle swarm optimization algorithm uses adaptive inertia weight. When the current fitness of the particle is less than the average fitness of all particles, it means that the particle is currently near a better solution, and the inertia weight needs to be reduced to enhance its local search ability. When the current fitness of the particle is greater than the average fitness of all particles, it means that the particle is currently near a poor solution, and the inertia weight needs to be increased to enhance its global search ability. The calculation formula is as follows:where and are the set minimum and maximum inertia coefficients, respectively; is the average fitness of all particles at the d iteration; and is the maximum fitness of all particles at the d iteration.

2.3. Autocorrelated Errors and Its Adjustment Method

Generally, the mathematical model for time series forecasting using neural networks is as follows:where is the sample observation value; is the predicted value at time t; and is the error at time t; is the parameter in the neural network including weights, thresholds, and so on; and W is the width of the sliding window.

When training a neural network via maximum likelihood estimation on time series data. There is a common assumption: the errors between different time steps are uncorrelated, as shown in the following equation:

However, in practice, due to the omission of influencing variables, measurement errors, and so on, the errors of regression prediction are autocorrelated, as shown in (15). Autocorrelated errors violate the assumption that the errors are uncorrelated, which means that the Gauss–Markov theorem is inapplicable. Specifically, the variance of parameter estimation increases, but the standard error of estimation is underestimated, which makes the network parameters obtained by maximum likelihood estimation training cannot be optimized, so the accuracy of time series prediction will be reduced.

Durbin–Watson test, referred to as the DW test [36], can be used to test the first-order errors autocorrelation in sequence data. The verification amount d is calculated as shown in the following equation:

The range of the d value is between 0 and 4. It is generally believed that the closer the d value is to 2, the greater the confidence in judging error-free autocorrelation. The closer the d value is to 0, the errors have the stronger positive correlation. The closer the d value is to 4, the errors have the stronger negative correlation.

In order to adjust for autocorrelated errors, the input and output forms of the neural network should be adjusted. In order not to complicate the model too much, only the most significant first-order autocorrelation is usually considered.

Equation (17) can be rewritten as follows:

Transpose (18) and then get the following equation:

There is a problem that the target Xt-ρXt-1 is related to the difference between two model outputs instead of the output of a single model, which complicates the optimization. Therefore, we approximate the right hand side of (19) with just one model for the same set of inputs.

Now, (19) becomes

To make the input series and the target series have the same form, we modify (21) intowhere is the autocorrelation coefficient.

It is obvious that (22) has a similar form to the generalized least squares. This method can reduce or eliminate the autocorrelation of prediction errors by changing the form of regression, thereby obtaining better network parameters and improving prediction accuracy.

2.4. Feature Selection

The power load is often affected by a variety of factors, such as temperature, humidity, electricity price, and so on, and the input of too many features will greatly increase the amount of calculation and even reduce prediction accuracy. Thus, it is very important to screen out the features that are beneficial to prediction results. In this paper, the method of calculating the Pearson correlation coefficient is adopted, and the feature with a large absolute value of correlation coefficient with the power load is selected as the final input of the model. The formula for calculating the Pearson correlation coefficient is shown in the following equation:where is the Pearson correlation coefficient of two variables X and Y and E is the mathematical expectation.

3. SAPSO-CNN-LSTM Model considering Autocorrelated Errors

3.1. CNN-LSTM Neural Network

The CNN-LSTM structure used in this paper is shown in Figure 3. Features are first extracted from time series data by a combination of two sets of convolutional layers and pooling layers. The data are then flattened through flatten layer and fed into a one-layer LSTM layer for training. Finally, it is output through a fully connected layer. The size of the convolution kernel is 3 × 3, and the hyperparameters in the rest of the network will be determined by the SAPSO optimization algorithm.

3.2. SAPSO Algorithm Optimization

The SAPSO algorithm is used to optimize the hyperparameters in the CNN-LSTM network. The flow chart of the SAPSO-CNN-LSTM model is shown in Figure 4.

The steps of the SAPSO-CNN-LSTM algorithm are given in Algorithm 1.

(1)Perform data preprocessing, fill in vacancies, and remove outliers.
(2)Define fitness fit as the mean absolute error of network prediction, as in
where is the observed value and is the predicted value.
(3)Initialize the particle swarm.
(4)Construct multiple CNN-LSTM networks with the location information of particles as the hyperparameters of the network.
(5)Train all networks to obtain the fitness of each particle, find the personal best and global best, and record their position information.
(6)Perform simulated annealing and select the personal best with the largest jump probability to replace the global best. The jump probability of each personal best is shown in (9) and (10).
(7)Update the velocity and position of particles with adaptive inertia weights.
(8)Decrease the annealing temperature and increase the number of iterations by one.
(9)Judge whether the maximum number of iterations is reached. If the maximum number of iterations is reached, the optimization ends, and the process goes to step (10); otherwise, it returns to step (4).
(10)Find the personal best with the smallest fitness and use its location information as a hyperparameter to retrain the network.
(11)Use the trained network to make predictions.

The hyperparameters to be optimized and their value ranges are shown in Table 1. The parameters in the SAPSO algorithm are: ; the number of particles is 30; ; and the maximum number of iterations is 20.

3.3. Adjusting for Autocorrelated Errors

According to the method of adjusting for autocorrelated errors in 1.4, when predicting the power load data at time t, it is only necessary to convert the input data into . For that may not be present in the sample data, the sample mean is used instead. At the same time, since the input and output of the model have the same form, the output needs to be converted into . The autocorrelation coefficient is determined by the grid search method.

4. Simulation Experiment and Analysis

In order to verify the feasibility and effectiveness of the proposed model, this paper uses the load data set in an area of Australia in 2006 to conduct experiments and compare the four models of LSTM, CNN-LSTM, PSO-CNN-LSTM, and SAPSO-CNN-LSTM with SAPSO-CNN-LSTM model considering autocorrelated errors proposed in this paper.

4.1. Data Preprocessing and Evaluation Indicator

After removing outliers and filling in vacancies, normalize the data; the formula is as follows:where is the normalized value and and are the maximum and minimum values in the data, respectively.

In this paper, root-mean-square error (RMSE) and mean absolute percentage error (MAPE) are used as the evaluation indicators of experimental results.where is the predicted value and is the observed value. The lower the MAPE and RMSE, the better the prediction effect.

4.2. Load Data Set Experiment in a Certain Area of Australia

This paper uses a multi-dimensional power load data set from July 1 to December 31, 2006, in an area of Australia, including six dimensions: historical load, electricity price, dew point temperature, dry-bulb temperature, wet-bulb temperature, and humidity. Data were sampled every 30 minutes, with a total of 48 load points a day. Take the first 80% of the data set as the training set and the last 20% as the test set. When predicting, the data is processed by window sliding with the length of one day, that is, the window sliding width is 48.

The correlation coefficient between each feature and historical load is calculated separately, and the results are shown in Table 2. Humidity and electricity price with larger absolute values of the correlation coefficient are selected as input features, that is, the input data has three dimensions: historical load, humidity, and electricity price.

The hyperparameters optimized by the SAPSO algorithm and autocorrelation coefficient results obtained by grid search are shown in Table 3. Using the optimized hyperparameters to train the model, the power load of a day (December 31, 2006) and the power load of a week (from December 25 to 31, 2006) were predicted. The prediction results are shown in Table 4.

In Table 4, RMSE(1d) and MAPE(1d) and RMSE(7d) and MAPE(7d) represent the root-mean-square error and mean absolute percentage error of forecasting one day and one week, respectively.

It can be seen from Table 4 that when predicting the power load of one day compared with other models, the RMSE values of the model proposed in this paper are reduced by 102.59, 55.30, 21.98, and 9.71, respectively. MAPE decreased by 1.29%, 0.58%, 0.28%, and 0.10%, respectively. When predicting the electricity load for a week, compared with other models, the RMSE values of model proposed in this paper are reduced by 93.89, 60.55, 16.47, and 12.83, respectively. MAPE decreased by 1.22%, 0.56%, 0.27%, and 0.14%, respectively. It can be seen that the model in this paper has higher accuracy when making predictions for one day and one week.

Figure 5 is a comparison diagram of the actual value of power load on December 31, 2006, and the predicted value of each model. It can be seen from Figure 5(a) that the CNN-LSTM network has a better prediction effect than the LSTM network, especially during the peak period of the day; it can more accurately capture the trend of data changes. This shows that the convolutional layer effectively extracts the features in the time series data and improves the prediction accuracy. As can be seen from Figure 5(b), compared with the PSO algorithm, the SAPSO algorithm can find better network hyperparameters and improve the performance of the model during the trough period of the day. This shows that SAPSO solves the problem that the traditional particle swarm optimization algorithm is easy to fall into the local optimum to a certain extent. It can be seen from Figure 5(c) that the SAPSO-CNN-LSTM model improves the prediction effect at some time points after the autocorrelated errors adjustment is performed, which further improves the prediction accuracy.

The DW test results are shown in Figure 6. From Figure 6, it can be seen that the adjustment of autocorrelated errors in this paper significantly reduces the autocorrelation of errors. This further illustrates that this adjustment method improves the training effect of the model by reducing the autocorrelation of errors, thereby improving the prediction accuracy.

The computational complexity of the proposed model is discussed next. In this experiment, it takes 28 s and 41 s to train the LSTM network and CNN-LSTM network, respectively. It shows that with the more complex structure of the neural network, the training time does not increase significantly. This is because CNN greatly reduces the number of parameters in the training process through the local connection of neurons and weight sharing of convolution kernel, thus improving the model training speed. The autocorrelated errors adjustment simply change the form of input data without increasing the computational complexity; it only needs a little time to determine the autocorrelation coefficient through grid search. However, both the PSO algorithm and SAPSO algorithm inevitably need a lot of time to search for optimal hyperparameters intelligently. In this experiment, the whole process takes about 150 minutes. However, in the field of machine learning, the adjustment of hyperparameters has always been a difficult problem, so we think it is worthwhile to spend a certain amount of time obtaining the ideal hyperparameters.

5. Conclusion

This paper proposes a SAPSO-CNN-LSTM model that considers autocorrelated errors. Firstly, the input features are filtered according to the calculated correlation coefficient. Then we adjust for autocorrelated errors by adjusting the data input form, use the CNN network to extract features in the time series data, and use the LSTM network for training to get the output. Finally, convert the form of output data to get the predicted value. Finally, taking the multi-dimensional power load data of a certain region in Australia and single-dimensional power load data of an electrical company as the experimental data sets, the proposed model has the following advantages:(1)The data sets contain multi-dimensional influencing factors, which can give full play to the characteristics of the CNN network to extract potential connections in multi-dimensional data.(2)The use of the LSTM network can fully mine time series in the data, and combined with the CNN network, the advantages of both can be combined.(3)The errors autocorrelation in time series prediction is considered and adjusted. The experimental results show that autocorrelated errors adjustment can improve the prediction accuracy to a certain extent.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.