Abstract
Aiming at the problems of low prediction accuracy, long time, and poor results in current wind turbine generation power prediction methods, an offshore wind turbine generation power prediction method based on cascaded deep learning is proposed. Using deep belief networks, stacked autoencoding networks, and long short-term memory networks, a cascaded deep learning method is proposed to predict the power generation of offshore wind turbines. Multiple feature extractors are used to extract and fuse high-level features to form a unified feature with richer information to predict the power generation sequence of offshore wind turbines. According to the modeling strategy and port design strategy, using the stacked autoencoding networks as the basic unit, a cascaded deep learning model for generating power prediction of offshore wind turbines is established. Through the selection of input variables, the variables that have a great correlation with wind power are obtained. The layer-by-layer greedy algorithm is used for training from bottom to top, and supervised learning is used to fine-tune the network parameters from top to bottom to realize the generation power prediction of the offshore wind turbine. The experimental results show that the proposed method is effective in predicting the power generation of offshore wind turbines, which can effectively improve the prediction accuracy and shorten the prediction time.
1. Introduction
Due to the vigorous exploitation of nonrenewable energy such as coal mines and oil, today’s society is facing various environmental problems such as environmental pollution, the greenhouse effect, and the depletion of nonrenewable energy [1]. In order to solve a series of problems, a large number of renewable energies, such as solar energy and wind energy, will occupy a more and more important position in the energy field as important strategic energy development objects. With the gradual depletion of global fossil energy and the deterioration of Earth’s overall environment, the development and utilization of renewable energy have become a consensus all over the world. With the characteristics of fast growth and high technical maturity, the proportion of wind energy in power supply all over the world has increased year by year. Wind energy is a pollution-free, renewable, and relatively mature renewable energy source. The cost of power generation is decreasing day by day, and the world is rich in reserves. At present, the utilization of wind energy can effectively alleviate the shortage of fossil fuels [2–4]. At present, with the continuous growth of wind power installed capacity, the impact of large-scale wind power access on the power grid is becoming more and more obvious. Accurate and effective prediction of wind power can reduce the adverse impact of wind power access on the power grid and optimize power grid dispatching [5]. Therefore, the research on wind power prediction has important practical significance.
At present, scholars in related fields have studied wind power prediction and achieved some theoretical results. Peng et al. [6] proposed a wind power prediction method for wind farms based on multifeature similarity matching. The multifeature similarity matching method is used to optimize and analyze the key parameters. The influence of each key parameter on the prediction error and the applicability of this method at different regional scales are analyzed. The new method and optimization analysis process are verified by the wind farm group, composed of wind farms. This method has been proven to be an effective wind power prediction method for wind farms and has potential industrial application prospects. Sun et al. [7] proposed an artificial neural network wind turbine power modeling and optimization method based on wind farm experimental data. The power prediction model is constructed and the yaw angle is optimized to minimize the impact on the overall wake of the wind turbine. The artificial neural network considering the wake effect is used to estimate the total power generation of a wind turbine under a given wind speed, wind direction, and yaw angle. The model is trained and evaluated using the experimental data of five wind turbines operating on a wind farm. It can effectively improve the total power of wind turbines in all directions. However, the above methods still have the problems of low prediction accuracy, long waiting times, and poor effect.
To solve the above problems, a power generation prediction method of offshore wind turbines based on cascade deep learning is proposed. A cascade deep learning method is used to predict the power generation of offshore wind turbines [8–10]. Multiple feature extractors are used to extract and fuse high-order features to predict the generation power sequence of offshore wind turbines. According to the modeling strategy and port design strategy, the stacked autoencoding network is used as the basic unit to build a cascaded deep learning model for offshore wind turbine generation power prediction. Through the selection of input variables, variables that have a greater correlation with wind power are obtained, and the power generation prediction of offshore wind turbines is realized. The forecasting effect of the offshore wind turbine generating power of this method is good, which can effectively improve the forecasting accuracy and shorten the forecasting time.
2. Cascaded Deep Learning Network
2.1. Deep Belief Network
A deep belief network (DBN) is formed by stacking several Restricted Boltzmann Machines (RBM) with powerful unsupervised learning capabilities [11]. The structure of the deep belief network is shown in Figure 1.

In Figure 1, each RBM contains a visible layer and a hidden layer, and the last layer is superimposed on the BP neural network. The training of DBN consists of two processes: unsupervised pretraining of RBM and supervised fine-tuning using the BP algorithm. Suppose that the network input is , the label is , the hidden layer of RBM is and its corresponding neuron is , the visible layer is and its corresponding neuron is . The training process is as follows:
2.1.1. Unsupervised Pretraining
Pretraining is to input the original data into the lowest RBM visual layer, and then train RBM1. After training, take the hidden layer of RBM1 as the visual layer of RBM2, continue to train RbM2, then take the hidden layer of RBM2 as the visual layer of RBM3, and so on until the training is completed. RBM is an energy model based on thermodynamics. Its energy function is defined as , and the joint probability distribution of the hidden layer and the visible layer is , then,
In Formula (1), represents the weight between the neuron in the hidden layer and the neuron in the visible layer, represents the threshold of the neuron in the visible layer, and represents the hidden layer. The threshold of neurons, represents the number of visible layers, and represents the number of hidden layers. From Formula (1), the joint probability density of can be obtained as follows:
In Formula (2), represents the normalized numerator. Because the structural feature of RBM is that there is a connection between layers and there is no connection inside each layer, if the state value of visual neurons is given, whether each hidden neuron is activated is conditionally independent. When the state of the visible layer is known, then the activation probability of the neuron in the hidden layer is given as follows:
When the state of the hidden layer is known, the activation probability of the neuron in the visible layer is given as follows:
In Formula (4), is the activation function, and the training process of RBM is unsupervised. Assuming parameter , the training process of RBM is to find the value of the parameter . can be obtained by maximizing the log-likelihood function of RBM on the training set. Let be the known input sample, then,
In order to obtain the optimal solution of , the gradient descent method is used to find the maximum value of [12]. Then, the gradient of the log-likelihood function with respect to is given as follows:
In Formula (6), represents the mathematical expectation about the distribution , and represents the probability distribution of the hidden layer when the visible neuron is a known training sample . Considering that the structure of the RBM model is symmetrical and the neuron state conditions are independent, the contrast divergence algorithm (CD) is used to solve the problem.
2.1.2. Supervised Fine-Tuning
The BP network is superimposed on the last layer of DBN, and the output eigenvector of RBM is used as its input. Because, after RBM network training, it can only ensure that the weight in its own layer is optimal for the feature vector mapping of this layer and not for the whole DBN network, the BP network is also used to propagate the error signal from the top to bottom to each RBM layer to fine-tune the parameters of the whole DBN network.
The training of the RBM network can be regarded as the initialization of the weights of a deep BP network, which gives DBN the advantage of strong feature learning ability, but it does not have the disadvantages of the BP network that it is easy to fall into the local optimum and the training is slow [13–15].
2.2. Stacked Self-Encoding Network
Stacked Autoencoder (SAE) is one of the most commonly used deep learning methods at present. It is composed of several AEs stacked to realize the gradual abstraction of information features. Similar to DBN, it is a generative model [16]. The structure of the stacked self-encoding network is shown in Figure 2.

According to Figure 2, the SAE network is symmetrical left and right. The side where the number of neurons is reduced layer by layer from the leftmost input layer is called the coding side, and then these layers form the other side through a mirror image, which is called the decoding side, and the middle layer is called the bottleneck of SAE (i.e., the characteristic direction of data). SAE has been widely used in image classification, data analysis, audio analysis, and other fields because of its flexible structure, simple training, and strong feature extraction ability. Firstly, a brief introduction to the self-encoder is given.
2.2.1. Self-Encoder
The structure and principle of the self-encoder are shown in Figure 3.

(a)

(b)
According to Figure 3(a), a simple AE is a three-layer symmetric MLP in which the number of neurons in the hidden layer is less than that in the input layer and the output layer. From the input layer to the hidden layer is called the encoding process, and from the hidden layer to the output layer is called the decoding process. The purpose of training AE is to minimize the reconstruction error of input data. At this time, the output of a hidden layer is another expression of input data. Its dimension is less than that of original data, so as to achieve the effect of removing redundancy and extracting the characteristics of original data. Its function is similar to principal component analysis (PCA).
According to Figure 3(b), assuming that is the training data set, which contains data vectors, and is the reconstruction error, then the encoding process and decoding process of AE can be expressed as follows:
In Formula (7), represents a coding matrix, represents a decoding matrix, represents a coding threshold vector, represents a decoding threshold vector, and and represent an activation function. The essence of training AE is to find the parameter that minimizes on the data set , and set the optimal parameter to , which can be expressed as follows:
2.2.2. Stacked Autoencoding Network Training Process
The layer-by-layer training method of stacked autoencoding is shown in Figure 4.

The training process of SAE is similar to that of DBN. It also needs pretraining and fine-tuning. First, reconstruct the input layer of each AE layer until the whole network is trained, that is, the pretraining stage is completed, and then use the BP algorithm to realize supervised fine-tuning and adjust all network parameters from top to bottom.
2.3. Long Short-Term Memory Network
The long short-term memory network (LSTM) is an improved form of a cyclic neural network. It has most of the excellent characteristics of a cyclic neural network model. At the same time, it solves the problem that the gradient of a cyclic neural network is easy to disappear in the training process and cannot continue to improve the accuracy of the model [17–19]. Other neural networks are composed of independent neuron connections, and LSTM networks are similar to them, which are formed by connecting independent LSTM blocks. It contains three gates: input, forget, and output, as well as a block input, a block output, a memory cell, an output activation function, and a peephole connection, where the block output is repeatedly connected back to the block input and all the gates. The training process of the LSTM network includes the forward transmission of information and the back propagation of error through time.
2.3.1. Information Forwarding
Suppose the input vector at time is , the number of LSTM blocks is , and the number of inputs is . The weight vectors of an LSTM network layer are input weight: , loop weight: , peepholes connection weight: , and threshold: , then the calculation formula for each vector is as follows:
The block input is given as follows:
The input gate is given as follows:
The forget door is given as follows:
The cell status is given as follows:
The output gate is given as follows:
The block output is given as follows:
2.3.2. The Error Propagates Back through Time
The increment inside the LSTM block is calculated as follows:
In Formula (15), represents the incremental vector passed down from the previous layer. If is a loss function, then it is generally equal to but it does not include cyclic dependence. The input increment only needs to be calculated when the next layer needs training. The calculation formula is as follows:
Finally, the gradient calculation of each weight is as follows:
In Formula (17), represents any one of . Compared with MLP, the biggest difference in training LSTM is that if you want to predict a value at time , the previous samples need to be propagated through the network, and the number of time steps is defined when the network is set up. These memory cells will store transient information according to their training state and output a predicted value .
3. Prediction Method for Power Generation of Offshore Wind Turbines
Using a stacked autoencoding network and the long short-term memory network, aiming at the problem of generating power prediction of offshore wind turbines with a variety of heterogeneous data, a cascaded deep learning offshore wind turbine generating power prediction model is proposed with the help of multimodal learning and multitask learning strategies. The model is a comprehensive prediction framework composed of multiple feature extractors, a feature fusion layer, and a prediction terminal. Each feature extractor automatically extracts features from each variable and then sends them to the feature fusion layer for data fusion, and finally, the prediction terminal gives the prediction result.
3.1. Power Generation Prediction of Multimodal Offshore Wind Turbines
3.1.1. Problem Description
From the perspective of pattern recognition, the power generation prediction of offshore wind turbines can be converted into a mapping process between multiple objects, namely . Specifically, is the power generation sequence of the offshore wind turbine to be predicted, is the input variable, and is the implicit function of the prediction model. At present, most offshore wind turbine generation power prediction models are univariate models, that is, only the historical measurement values of the offshore wind turbine generation power are used for prediction [20]. However, in actual engineering, a wind farm often records multiple measurements, and these data may also serve as a forecast for the power generation of offshore wind turbines. From a physical point of view, the power generated by offshore wind turbines is a reflection of large aerodynamic energy, which is directly or indirectly related to a variety of factors, including wind turbine parameters, geographic conditions, and meteorological information. In fact, different quantitative measurements belong to the typical multisource heterogeneous data. They come from different sensors and have different physical meanings and dimensions. The machine learning community refers to them as multimodal data, and the learning task for them is called multimodal learning [21, 22]. Essentially, these quantitative measurements describe the different attributes of air kinetic energy and all contain the knowledge required for the power generation prediction of offshore wind turbines. From a philosophical point of view, they are different aspects of the same thing. They are different from each other, but they are internally unified.
3.1.2. Mathematical Modeling
The multistep prediction of the power generation of offshore wind turbines is of great significance to the power system and is widely used in equipment maintenance, energy storage management, and power market operations [23–25]. Multistep offshore wind power generation forecasting mainly includes two methods, namely direct forecasting and iterative forecasting. In this paper, direct prediction is used to reduce the cumulative error more effectively, so as to achieve a more accurate forecast of the power generation of offshore wind turbines.
Given types of measurement data, predict the time series of the offshore wind turbine generating power in a certain period of time in the future at time . Among them, represents the power generation value of the offshore wind turbine at time , and and represent the minimum and maximum predicted step sizes, respectively. The prediction problem is a typical sequence-to-sequence prediction problem, which can be expressed as follows:
In Formula (18), represents the input vector, and represent the parameter vector, and the implicit function of the prediction model, respectively, and represents the subvector corresponding to the measurement at time . Generally, can be defined as follows:
In Formula (19), represents the dimension of , and represents the value of the quantity measurement at time . Naturally, the dimension of is . The training of predictive models is a supervised learning problem. Given the training set , which contains input/output pairs, namely , then the optimal parameter of the prediction model can be obtained by minimizing the loss function, namely,
In Formula (20), represents the F norm. It is worth noting that when , the prediction problem degenerates into a univariate prediction problem, that is, only the historical sequence of offshore wind power generation is used for prediction. When , the prediction problem degenerates into a single-step offshore wind turbine generating power prediction problem. Therefore, compared to a single-step and a univariate offshore wind turbine generation power prediction, a multistep offshore wind turbine generation power prediction using multi-modal information is a more general problem.
3.2. Constructing an Offshore Wind Turbine Generating Power Prediction Model
3.2.1. Modeling Strategy
A variety of heterogeneous data is used to predict the power generation of offshore wind turbines, and multiple feature extractors are used to extract high-level features from various measurements. The high-level features from different measurements are fused to form a unified feature with richer information. Predict the power generation sequence of offshore wind turbines based on unified characteristics. In this way, the power generation prediction problem for offshore wind turbines can be described as follows:
In Formula (21), represents the unified feature, represents the high-order feature corresponding to the quantity measurement , represents the implicit function of the feature extractor corresponding to the quantity measurement in the first stage, and and respectively represent the second stage and the implicit function of the third-stage function. According to this strategy, a variety of quantitative measurements have been integrated under the same learning framework, which has the potential to provide richer information for offshore wind turbine generation power prediction.
3.2.2. Model Structure
According to the modeling strategy and port design strategy, a cascaded deep learning model for offshore wind turbine generation power prediction is proposed. SAE is used as the basic unit to build, and its structure is shown in Figure 5.

In Figure 5, represents the feature extractor corresponding to the quantity measurement , represents the number of layers of , represents the dimension of the layer in , and and represent the dimensions of the fusion layer and the prediction terminal layer, respectively. The model consists of feature extractors, a feature fusion layer, and a prediction terminal layer. Among them, the feature extractor is SAE, the feature fusion layer is an ordinary fully connected network, and the prediction terminal layer is a regression layer.
3.2.3. Input Variable Selection
The input variable selection aims to select a part of the candidate variables for model building, and it plays a very important role in the performance of the model. Theoretically, introducing more input variables will provide richer information for wind power forecasting. However, in practice, this may cause problems with noise and excessive model scale [26]. Therefore, in practical applications, it is necessary to make a purposeful selection of input variables, so as to achieve the purpose of controlling the scale of the model, reducing the computational complexity, and improving the performance of the model. In the task of wind power forecasting, the variables that have a greater correlation with wind power are selected through input variable selection. For the cascaded deep learning model of offshore wind turbine generation power prediction, the purpose of input variable selection is to select variables from candidate variables for wind power prediction.
3.2.4. Training Algorithm
The training of the cascaded deep learning model for power generation prediction of offshore wind turbines is as follows: the SAE at the bottom is trained from bottom to top through the layer-by-layer greedy algorithm [27–29]. In order to enhance the feature extraction capability of SAE, sparsity constraints are added to hidden layer units. At the same time, in order to reduce over-fitting, L2 regularization is performed on the network. For each AE, its loss function is as follows:
In Formula (22), the second term is KL divergence, and the third term is the L2 regular term. All SAE top layers and feature fusion layers are regarded as a new AE, and its input variable is paralleled by the input of the SAE top layer, namely,
After this step, all layers except the top layer are initialized. Use supervised learning to adjust the parameters of the network from the top to the bottom [30]. After fine-tuning the parameters, the entire network has better prediction performance. Through the above steps, the power generation prediction for offshore wind turbines is realized.
4. Experimental Analysis
4.1. Experimental Environment and Dataset
In order to verify the effectiveness of the generation power prediction method for offshore wind turbines based on cascade deep learning, a wind turbine with an offshore installed capacity of 160 MW is used as the research object. The ground clearance of the wind turbine hub is 80 m, the average altitude is 5.8 m, and the annual average wind speed is 7.2 m/s. Observe the power generation-related data of the offshore wind turbines from January 2021 to December 2021 and form a wind dataset containing data for more than 126,000 stations. The data set is divided into two sub-datasets: the training set and the confirmation set, including the first 60% and the last 40% of the whole dataset. The training set and the test set are used for model training and testing, respectively, and the validation set is used for model selection and overfitting prevention. Predict the generated power of offshore wind turbines in January 2022 and compare it with the actually observed generated power data of offshore wind turbines in January 2022 to determine the prediction performance of this method. In order to ensure that the many-to-many mapping structure can reach the final convergence, the learning rate is reduced each time the loss decreases slowly, and each time it is divided by 2 on the basis of the previous one. Set the model parameters of the stacked noise reduction automatic encoder as shown in Table 1.
4.2. Power Prediction Evaluation Index
The root mean square error (RMSE) and the mean absolute error (MAE) are used as evaluation indexes. MAE is the average value of absolute error. RMSE reflects the dispersion and reliability of actual value and predicted value. The smaller its value is, the higher the power prediction accuracy is. It is defined as follows:
In Formula (24), and are the actual and predicted values of wind power, respectively, and is the number of predicted data.
4.3. Comparison of Power Generation Prediction Effects of Offshore Wind Turbines
In order to verify the prediction effect of the proposed method, the methods of Peng et al. [6] and Sun et al. [7] are used to compare with the proposed method, and the comparison results of the power generation power prediction results of different methods of offshore wind turbines are shown in Figure 6.

It can be seen from Figure 6 that under different data sample collection intervals, the offshore wind turbine generation power prediction result of the method of Peng et al. [6] is relatively large, and there is a certain deviation from the actual value. The prediction result of the offshore wind turbine power generation by the method of Sun et al. [7] is relatively small, and the deviation from the actual value is the largest. The power generation power prediction results of the proposed method are basically consistent with the actual power generation fluctuations of the offshore wind turbines. It can be seen that, compared with the method of Peng et al. [6] and the method of Sun et al. [7], the proposed method has a better effect on predicting the power generation of offshore wind turbines.
4.4. Comparison of Power Generation Forecast Time of Offshore Wind Turbines
To further verify the prediction time of the proposed method, the method of Peng et al. [6], the method of Sun et al. [7], and the proposed method were used to compare, and the comparison results of the power generation power prediction time of offshore wind turbines were obtained as shown in Figure 7.

It can be seen from Figure 7 that with the increase of data sample training set data, the generation power prediction time of offshore wind turbines using different methods increases accordingly. When the training set is 1000, the prediction time of offshore wind turbine power by the method of Peng et al. [6] is 49.9 s, and the prediction time of offshore wind turbine power by the method of Sun et al. [7] is 47.5 s. The prediction time of offshore wind turbine power by the proposed method is only 25 s. It can be seen that, compared with the method of Peng et al. [6] and the method of Sun et al. [7], the proposed method has a shorter time to predict the power generation of offshore wind turbines.
4.5. Comparison of Power Generation Prediction Accuracy of Offshore Wind Turbines
On this basis, the prediction accuracy of the proposed method is further verified, and the method of Peng et al. [6], the method of Sun et al. [7], and the proposed method are used to compare, and the comparison results of the power generation power prediction errors of different methods of offshore wind turbines are obtained in Table 2.
According to the data in Table 2, as the data in the training set of data samples increases, the prediction error of the power generation of offshore wind turbines using different methods also increases. When the training set is 1000, the RMSE and MAE values of the method of Peng et al. [6] are 31.5% and 31.9%, respectively, and the RMSE and MAE values of the method of Sun et al. [7] are 17.8% and 29.5%, respectively. The RMSE and MAE values of the proposed method are 13.3% and 26.9%, respectively. It can be seen that, compared with the methods of Peng et al. [6] and Sun et al. [7], the RMSE and MAE values of the proposed method are smaller, which can effectively reduce the power generation prediction error of offshore wind turbines and improve the power generation prediction accuracy of offshore wind turbines.
5. Conclusion
The generation power prediction method of offshore wind turbines based on cascade deep learning proposed in this paper gives full play to the advantages of deep learning algorithms. It has high generation power prediction accuracy, can effectively shorten the generation power prediction time of offshore wind turbines, and has a good generation power prediction effect. However, due to the limitation of data acquisition channels, the prediction effect of wind farm groups has not been considered in this study. Therefore, in the next research, it is necessary to further broaden the historical data range of wind turbine power generation to the wind farm group, in order to realize the wide area of wind power big data prediction technology.
Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare that there are no conflicts of interest.