Abstract
Short-term wind speed forecasting plays an increasingly important role in the security, scheduling, and optimization of power systems. As wind speed signals are usually nonlinear and nonstationary, how to accurately forecast future states is a challenge for existing methods. In this paper, for highly complex wind speed signals, we propose a multiple kernel learning- (MKL-) based method to adaptively assign the weights of multiple prediction functions, which extends conventional wind speed forecasting methods using a support vector machine. First, empirical mode decomposition (EMD) is used to decompose complex signals into several intrinsic mode function component signals with different time scales. Then, for each channel, one multiple kernel model is constructed for forecasting the current sequence signal. Finally, several experiments are carried out on different New Zealand wind farm data, and the relevant prediction accuracy indexes and confidence intervals are evaluated. Extensive experimental results show that, compared with existing machine learning methods, the EMD-MKL model proposed in this paper has better performance in terms of the prediction accuracy evaluation indexes and confidence intervals and shows a better ability to generalize.
1. Introduction
Energy is crucial to social and economic development all over the world. With the acceleration of economic globalization, energy demands are growing rapidly [1, 2]. However, with the depletion of fossil fuels and the destruction of the environment, pollution-free and renewable energy sources must be sought [3]. As an inexhaustible and cleaner energy source, wind energy is regarded as an ideal substitute for conventional resources. Wind energy is the main force of this development and has become the main pillar of power supply in the world. According to a Global Wind Energy Council (GWEC) report, by the end of 2019, the total installed capacity of wind power reached 651 million kilowatts, which is 1.3 times higher than the 283.4 million kilowatts in 2012 and 5.93 times higher than the 94 million kilowatts in 2007. All installed wind turbines can cover 6% of global electricity demand [4]. However, due to the intrinsic properties of wind such as instability, mutability, etc., wind energy sometimes seems to be uncontrollable.
Wind energy has the characteristics of volatility, intermittency, and low energy density, which makes the wind speed and output power of wind farms also uncertain [5]. This uncertainty brings huge challenges to the operation and dispatch of power systems. Short-term wind speed prediction is considered to be one of the most effective and economic means to improve the peak load regulation ability of power grids, enhance the ability of power grids to accept wind power generation, and improve the security and economy of power system operation [6, 7]. Constructing a model to accurately forecast wind speed is very difficult due to the intermittency and fluctuation of wind speed [8].
In past decades, various algorithms such as statistical approaches, machine learning, and deep learning have been applied to the field of wind speed forecasting [6, 9,–13]. Each method employs different techniques and achieves good performance on different datasets and prediction horizons [14–16]. Statistical learning prediction models include autoregressive integrated moving average (ARIMA) [17, 18], exponential smoothing (ES) [19], grey forecasting method (GFM) [20], hidden Markov models (HMMs) [21], etc. In addition, artificial neural networks (ANNs) and machine learning models are also widely used in wind speed forecasting [22–29]. However, these models have some shortcomings, such as local minima or overfitting for ANNs and parameter sensitivity for genetic programming (GP) [30].
As one of the most popular machine learning methods, support vector machines (SVMs) are often used for regression with small samples and obtaining state-of-the-art performance in wind forecasting. In [31], a hybrid prediction model based on an SVM is proposed where an autoregressive model with time-delay coordinates is used for wind speed data feature selection and a genetic algorithm is used to optimize the parameters. A combined forecasting model based on Markov models and SVMs is proposed in [32]. A finite state Markov model is used to simulate wind fluctuation and seasonality. A prediction modeling method for shale shaker efficiency is proposed in [33], in which principal component analysis (PCA) is used to extract the nonlinear principal components of signals, and then LSSVMs (least squares support vector machines) are used to reconstruct the regression model. In [34], a novel model called IACO-LSSVM is proposed for wind speed forecasting. An improved ant colony optimization algorithm (IACO) is used to optimize the main parameters of LSSVM.
As wind speed data are usually intermittent and unsystematic with extreme nonlinear characteristics, an SVM is very sensitive to the selection of base kernels, especially on small sample data. To address this problem, the adaptive choice of optimal kernels is introduced according to the current testing case, which refers to linear combination techniques on base kernels, called multiple kernel learning (MKL) [35]. Given an input testing sample, the optimal linear combination of base kernels across different support samples is obtained using the training data to get the current testing value. This means, in different kernel domains, one testing sample may adaptively absorb those components that contributed to its intrinsic representation, which is naturally more robust than that with a fixed single kernel domain. The reason is that different kernel domains have different metric spaces, which may contribute to the final prediction score with different confidences. By using MKL, the metrics based on kernel domains can be adaptively combined such that a more robust predictor is obtained for wind forecasting.
Moreover, since wind speed is usually collected as a one-dimensional signal varying with time, directly using raw signals to predict the next state is not trivial, especially when noise or irregular information accompanies the wind speed signal. To address this problem, we employ empirical mode decomposition (EMD) [36] to decompose the raw signal into several intrinsic mode functions (IMFs) with different frequency range and then feed these into the MKL predictor. Extensive experiments on public wind speed prediction data show that our proposed MKL-based wind speed forecasting method is more effective and competitive than current methods.
The remainder of this paper is organized as follows. Section 2 discusses the proposed method and the hybrid model in detail. Section 3 describes the forecasting performance evaluation criterion. Experiments are discussed in Section 4. The last section concludes the paper.
2. The Proposed Method
2.1. Overview
Empirical mode decomposition is used to decompose complex signals into several intrinsic mode function component signals with different time scales.
Figure 1 shows the proposed wind speed forecasting process based on EMD and MKL, and the steps are as follows:(1)EMD is used to decompose the raw wind speed data into a number of IMFs. Since the raw wind speed signals are usually nonlinear and nonstationary, EMD can break down the signal without leaving the time domain, which is useful for analyzing the original signal in the next steps.(2)After EMD decomposition, multiple time-domain signals with different frequency bands are obtained. To reduce the scale variations of each signal, we normalize the EMD signals to the same scale space for the kernel learning that follows. The generated signals are scaled into the range [−1, 1] using the translation and scaling transformations.(3)Construct one MKL model for each EMD decomposed signal. As prediction is highly complex, multiple kernel matrices on this signal data are adaptively chosen and combined to form the final prediction model.(4)Combine all prediction results for all EMD signals. To recover the predicted result of each MKL into the original scale space, the scale transform is inverted in contrast to step (2).

In engineering applications, single-step forecasting often fails to meet actual needs, and it is necessary to use multistep forecasting to predict more long-term points in the future. Therefore, it is necessary to select an appropriate prediction strategy to reduce the prediction error. For this, a recursive model is chosen. Three-step predictions are conducted as are often done in the wind speed prediction literature. The schematic diagram of the recursive multistep prediction model is given in Figure 2. Recursive multistep prediction uses the same prediction model and uses the predicted value of the previous step as a new input to the next step of prediction.

If is the truth value, is the predicted value, the size of the history window is 5, and the current time is , then the predicted value of at time is obtained by . The predicted value of at time is obtained by . The predicted value of at time is obtained by .
2.2. Wind Speed Signal Decomposition via EMD
EMD is a new method for time-frequency analysis. The basic idea is to decompose the original time series data into several subsequences with different frequency characteristics [36, 37]. EMD can decompose complex input signals into many intrinsic mode functions, which reflect the local characteristic signals of different time scales [38].
According to the explanation of IMF in [39], all functions involving time series data can be decomposed into the following:where (i = 1, 2, ⋯, n) are the different IMFs and is the residue. Step 1: identify the local maxima and minima of the series , which can yield the corresponding upper envelope {} and lower envelope {} by means of a cubic spline interpolation of all the local extrema. Step 2: compute the mean envelope {}: Let . Step 3: check whether {} is an IMF. If {} is an IMF, then set and replace with the residual ; if {} is not an IMF, replace {} with {} and repeat steps 2 and 3 until the termination criterion is reached. The stopping condition of the iterative process is as follows: where l is the length of the original series {}, i is the number of iterations, and δ is the termination threshold. Step 4: repeat steps 1–4 to obtain all IMFs.
In short, we can use EMD to decompose the original wind speed data into different subsequences. These IMFs have different frequency characteristics, which is conducive to the implementation of the prediction model. In this paper, the MKL algorithm is used to model each subsequence.
2.3. Signal Prediction Model Based on MKL
MKL is used to learn the optimal kernel from a set of predefined base kernels and has been applied in many fields, such as image classification, face recognition, and so on. As far as we know, MKL is rarely used in wind speed prediction models.
Based on the definition of the kernel slack variable for each base kernel, we now build the MKL model for wind speed prediction. An L2-norm MKL is used in our approach because of its effectiveness in many applications, and it was originally used for classification as described in [40]. For wind speed forecasting, we modify it for the case of regression. Let (i = 1, ⋯, n) denote the training pairs of the historic and the corresponding next prediction values; we consider a linear support vector machine of the formwhere the nonlinear mapping maps the original signals to a Hilbert space and is the predicted result based on the input signals. To deal with the complexity of wind speed forecasting, we may employ M different feature mappings, i.e., , where m = 1, ⋯, M. Then, the prediction model with multiple mappings is extended to the following equation:where is the weight of the m-th submodel. Finally, this primal MKL optimization model can be written as follows:
In the above model, is used to regularize the weight of each model and V is a convex loss function. To solve this objective function, we may iteratively optimize and { …}. For the weight factor , by fixing {, …}, we can derive the following solution according to the proposition in [41]:where p = 2. If is fixed, we can derive the dual model by using dual theory, i.e.,where is the m-th kernel matrix. This model is regarded as an SVM model, and thus solutions are easily achievable via public tools, such as LibSVM [42]. Finally, the weighted kernel matrices contribute to the prediction output, where is adaptively learned in the training data while keeping the max margin in the regression process.
It has been clearly observed that the structural risk function is well controlled by introducing the penalty parameter θ, and the solution of the MKL problem can be changed by varying this parameter, giving a novel perspective to such problems. For the kernel design, we may use different types of kernel, such as the RBF kernel, linear kernel, polynomial kernel, and so on. Moreover, different parameters for these kernels can be employed to increase the quantity of kernel matrices.
3. Experiments
In this section, we verify our method using real wind speed data. First, we introduce the used dataset, then illustrate the evaluation criteria for wind speed forecasting, and finally conduct the experiments and compare the results with other methods.
3.1. Dataset
This study proposes a hybrid approach to predict short-term wind speed. Wind speed data from a wind farm location in New Zealand are used to demonstrate the effectiveness and reliability of the proposed hybrid EMD-MKL forecasting approach. In this paper, we use the open wind speed datasets of a wind farm in New Zealand to verify the validity and reliability of the novel EMD-MKL model. The data are from an organization called NIWA, published in the public report in [43].
We used data collected at the hub height of 85 meters in the STH1 wind farm and selected 5000 10-minute wind speed data samples in 2003 as the experimental data. Of these, 4500 samples are used to train the EMD-MKL model, and the remaining 500 samples are used for testing.
In our simulation, the size of the moving window is set to 5 samples for the 10-minute wind speed series. Then, we slide the windows containing the sample data to train the EMD-MKL model. 500 test samples can slide 495 times, which means that our model can obtain 495 prediction values.
3.2. Evaluation Criteria
In the evaluation of wind speed prediction models, the following error evaluation index functions are typically used: the mean absolute error (MAE), the root mean square error (RMSE), the mean absolute percentage error (MAPE), the mean squared prediction error (MSPE), and the maximum absolute percentage error (MaxAPE) [43–46]. These error evaluation index functions are defined as follows:where e (i) = x (i) − y (i), n is the sample size, and x (i) and y (i) are the actual and forecasted values at time period i, respectively.
3.3. Confidence Interval
Based on the probability density function of the wind speed prediction error, the confidence interval under a certain confidence probability is calculated to establish the reliability of the forecasted value of the wind speed point.
The error of wind speed forecasting is defined as the deviation between the actual measured value of the wind speed at a certain point in time and the predicted value , i.e.,
The fluctuation of the prediction error due to the different wind speed values is quite different. Therefore, this paper divides the wind speed prediction into multiple intervals and establishes the wind speed prediction error distribution of different wind speed intervals.
The maximum number of zones is and , respectively. The partition Di is
The aforementioned interval sample points may be insufficient to reflect the actual distribution of the prediction errors. So, a second division is needed, i.e., the division of a part of the distribution of adjacent samples and fewer points together, until the merger of the new interval after the number of sample points meets the requirements.
For a wind speed range of the prediction error samples, the probability density function is as follows:where ei is the wind speed prediction error sample.
The probability density function of the prediction error is obtained by kernel density estimation and curve fitting, and then the cumulative probability distribution function is obtained by integral. Assuming that the cumulative probability distribution function of the prediction error is F (ξ) (ξ is a random variable of the prediction error), the confidence interval of the true value of the wind power satisfying a confidence probability of 1 − α is as follows:where is the inverse of the cumulative probability distribution function . In this paper, we take a symmetric probability interval, that is, and .
According to the wind speed prediction error probability density curve and speed point prediction value, the confidence interval can be obtained at a certain confidence probability; the specific steps are as follows: Step 1: for a certain predicted value, determine the wind speed interval to which it belongs and then find the predicted error probability density curve corresponding to the wind speed prediction interval Step 2: apply cubic spline interpolation to the curve fitting to find the α/2 and ((1 − α)/2) subpoints of the wind speed prediction error Step 3: use (13) to calculate the confidence interval of the wind speed forecasting
3.4. Simulation
In this section, in order to verify the effectiveness of the EMD-MKL model, we use two different hybrid models (EMD-MKL and EMD-SVM) to compare the results of multistep wind speed forecasting. Then, we implement 1-step to 3-step prediction for the 10-minute wind speed using the above two models. The CPU used in the experiment is Intel Core i7-6700k (8 m ache, 4.0 GHz), the memory is 64 G, and the operating system is Win 10 (64 bit).
First, we use EMD to decompose the selected wind speed data and decompose the complex wind speed signals into several subsequences with different frequency characteristics. We set δ = 0.3 as the default value for EMD; there are 16 IMF subsequences and one residue in this experiment. The results of partial decomposition subsequence of wind speed data are shown in Figure 3.

Then, we use the SVM and MKL to train the eight subsequences. Through parameter optimization, the parameters of the SVM and MKL are set as follows: SVM: cParam = 1 and gParam = 1.1953 MKL: cParam = 1 and gParam = [0.9, 0.925, 0.95, 0.975, 1, 1.025, 1.05, 1.075, 1.1]
Finally, the model reconstructs and combines the prediction results of subsequences to get the final prediction results. The prediction results of 1–3 steps of the two hybrid models are shown in Figures 4–6.



3.5. Comparisons
From these figures, it can be seen that in multistep prediction, compared with EMD-SVM, EMD-MKL has better results regarding the phase shift and amplitude fluctuation and has better performance in following trend change and predicting mutation points.
In Figures 7–11, we compare the recent state-of-the-art methods, including EMD-SVM, EMD-RBF (radial basis function), EMD-ELMAN (Elman neural network), EMD-GMDH (group method of data handling), and EMD-MAD (multiple attribute decision) [33]. As observed from these figures, it can be seen that our proposed method is more competitive. We can see that EMD-MKL has worse results for 1-step forecasting but has high precision for 2-step and 3-step forecasting.





When comparing these models, it is seen that the performance of EMD-MKL is better than that of EMD-SVM. Compared with EMD-SVM, EMD-MKL improved the RMSE index of 2-step and 3-step forecasting by 3.73% and 10.33%, respectively. EMD-MKL improved the MAE index of 2-step and 3-step forecasting by 5.44% and 11.56%, respectively. Thus, we can benefit from the use of multiple kernels, which enhance the model fitting capacity while keeping the generalization ability with the max margin operation. The reason for these improvements is that for highly complex wind speed signals, the multiple kernel learning method can adaptively assign the weights of multiple prediction functions, which extends those of conventional wind speed forecasting methods using support vector machines.
3.6. More Results
To further illustrate the advantages of the proposed models, this study provides four different wind speed series CKS1, STH3, CKS2, and MWT2. The above examples of the forecasted results are listed in Tables 1–4. From these tables, the same conclusion can be drawn as in Section 3.5.
3.7. Result Analysis with Confidence Intervals
Using equations (10)–(13), the confidence intervals can be calculated. The results are shown in Figures 12 and 13. As can be seen, most of the truth values fall within the confidence intervals and only individual points are outside the interval, indicating that the confidence of most of the predictions is 90%. The larger the confidence level, the longer the confidence interval. The envelope of the confidence interval can reflect the range of the true value of the short-term wind speed from the probability point of view and better reflect the wind speed.


The training values are divided into seven intervals; each prediction value belongs to one of these seven intervals. The predicted values in the same interval have the same and ; thus, the length of the confidence intervals of the predicted values in the same interval is equal, that is, the value of is equal.
From the confidence interval, Tables 5 and 6 show that the confidence interval distance of the predicted value of the MKL method is smaller than the confidence interval distance of the predicted value of the SVM method when we have a confidence level of 90%. The range of the obtained MKL is smaller; we know that if the confidence interval is smaller, then the accuracy of the forecast is higher. So, the MKL method has better accuracy than the SVM method.
4. Conclusion
This paper presented a hybrid short-term wind speed prediction method using EMD and MKL. For complex wind speed signals, multiple kernel learning methods can adaptively allocate the weights of multiple prediction functions, while EMD generates more subsequences with different frequency characteristics and enhances the stationarity of signal features, using multichannel signals to increase the robustness of the signal features. Through multistep wind speed prediction, it was demonstrated that, compared with EMD-SVM, EMD-MKL has better results in terms of phase offset and amplitude fluctuation and has better performance in following trend changes and forecasting mutation points, having a higher prediction accuracy and stability. The results showed that the prediction accuracy of 10-minute wind speed of EMD-MKL under different error criteria was higher than that of EMD-SVM, and the EMD-MKL method also showed high stability. Therefore, the EMD-MKL model has high potential for wind speed prediction.
This paper selected data from several wind farms in New Zealand for testing and verified that EMD-MKL can generalize well. In addition, based on the probability density curve of wind speed prediction error and the prediction value of mutation points, we conclude that the fluctuation width of EMD-MKL in the confidence interval is small and has good prediction performance. Therefore, the EMD-MKL hybrid model has great potential in wind speed prediction. In recent years, with the extensive application of deep learning in various fields, we expect to combine deep learning with wind power forecasting to produce more research results.
Data Availability
The data used to support the findings of this study have been deposited in the generating synthetic wind data of NIWA (https://www.niwa.co.nz/environmental-information/research-projects/synthetic-wind-data).
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the National Key R&D Program of China (2017YFA60700602 and 2018YFC0809200) and the Ministry of Industry and Information Technology of China under project “Industrial Internet Platform Test Bed for Optimizing the Operation of Motor and Driving Equipment” (2018-06-46).