Abstract

For efficient energy distribution, microgrids (MG) provide significant assistance to main grids and act as a bridge between the power generation and consumption. Renewable energy generation resources, particularly photovoltaics (PVs), are considered as a clean source of energy but are highly complex, volatile, and intermittent in nature making their forecasting challenging. Thus, a reliable, optimized, and a robust forecasting method deployed at MG objectifies these challenges by providing accurate renewable energy production forecasting and establishing a precise power generation and consumption matching at MG. Furthermore, it ensures effective planning, operation, and acquisition from the main grid in the case of superior or inferior amounts of energy, respectively. Therefore, in this work, we develop an end-to-end hybrid network for automatic PV power forecasting, comprising three basic steps. Firstly, data preprocessing is performed to normalize, remove the outliers, and deal with the missing values prominently. Next, the temporal features are extracted using deep sequential modelling schemes, followed by the extraction of spatial features via convolutional neural networks. These features are then fed to fully connected layers for optimal PV power forecasting. In the third step, the proposed model is evaluated on publicly available PV power generation datasets, where its performance reveals lower error rates when compared to state-of-the-art methods.

1. Introduction

Photovoltaic (PV) power generation is one of the easiest-to-access, low-cost, and most promising sources of renewable energy. When the energy demands rise in the developing country, the PV power generation annually increases; therefore, it mitigates the global energy and climatic change crisis [1]. According to the Global Future Report, by 2050, the PV generation capacity will reach 8000 GW [2]. However, different atmospheric variables such as temperature, solar irradiance, humidity, and cloud properties cause significant uncertainty in integrating PVs to microgrid (MG) [37]. In contrast, an effective PV power forecasting model greatly improves solar power utilization [810]. Therefore, efficient forecasting models in the utility grid will operate the power grid economically and transfer the required energy to the end-users [11, 12]. Over the years, for efficient energy management and distribution, MG has played an important role in ensuring reliability, two-way power flow, self-healing, and demand response [6]. Although MG offers several advantages, due to the volatile and intermittent nature of PV power, integrating a larger portion of renewable energy into existing power generating systems creates several challenges, such as load and demand mismatch, poor scheduling, operation, penalties enforced by customers, and fluctuations in the load connected to the power systems. To tackle these challenges, integrating an intelligent forecasting model into the MG greatly reduces the aforementioned problems.

Forecasting PV power belongs to the time series (TS) forecasting problem which are divided into univariate and multivariate forecasting [13]. Based on the time horizon, these methods are divided into three types, such as long-term, medium-term, and short-term power forecasting [14, 15]. For different scheduling and planning, each type has its own uses, for example, contributing to long-term planning and decision-making such as month or year, usually long-term forecasting is used. Similarly, for medium-term scheduling, such as looking ahead one week or less, medium-term forecasting is used. Finally, short-term forecasting is the most challenging since the target is to look ahead for a short period of time, such as hours, but it is the most reliable and accurate method for PV forecasting. The forecasting models are divided into three types, such as physical, statistical, and deep learning models [12]. Historical data is not needed in a physical model but they are used in the solar radiation and the interaction between physics laws [16], where it further consists of three sub-modules, such as numerical weather prediction [17], total sky image [18], and satellite image [19]. The modelling techniques of the physical model can be divided into regression model [20], autoregressive [21], grey theory [22], Markov chain [23], and fuzzy theory [24]. However, physical models poorly perform in ultra-short-term forecasting because it takes a long time and only produces six hours of meteorological data [16]. There are huge deviations and low precision in the results of the physical models; therefore, it is impractical to use them in PV forecasting [17] in the MG. The statistical forecasting modelling establishes a mapping relation between the historical data and the target forecasting data using the future prediction of PV power [16]. It is easy to use and possesses strong interregional versatility, but due to the complex and volatile nature of PV power generation, its TS is complex and nonperiodic [25]. The traditional statistical forecasting model provides limited performance on large-scale historical data due to long-range complex temporal information. Furthermore, due to shallow and simple processing methods, nonlinear PV power patterns are highly affecting the prediction of PV. Therefore, researchers investigated ANN-based approaches and significantly improved the performance of PV power due to their ability to learn the variational pattern of PV [26]. However, because of different atmospheric variables and complex patterns of the weather conditions, it is unable to extract the corresponding deep nonlinear characteristics and TS dynamics of PV power [27, 28]. The task of nonlinear mapping and feature extraction is extremely challenging; therefore, the best way to tackle these challenges is to employ deep learning models with the ability to extract the discriminative features end-to-end [29, 30]. In recent years, the application of deep learning models has significantly improved for image classification [31, 32], video classification [3337], and power forecasting in TS data [3842]. For instance, Khan et al. [43] proposed a hybrid model for electricity forecasting in residential and commercial buildings. They used the CNN model for spatial feature extraction and then applied a Bi-directional LSTM (Bi-LSTM) network for temporal feature extraction. Li et al. [44] proposed a hybrid model that integrated wavelet transform with CNN for PV power prediction in various horizons. Similarly, in [45], the authors predicted the day-ahead weather forecast data from the solar irradiance using LSTM and then established a mathematical model between irradiance and PV power to analyze the forecasting. Yona [46] proposed a novel method that uses atmospheric data and a deep neural network for the next day’s PV generation.

However, to accurately forecast the PV power, numerous researchers investigate different techniques to map the association between the historical data and the target attributes. Their methods are mainly focused on only spatial or temporal features, but without focusing on different discriminative features extracting strategies to hold the long-range temporal dependencies among complex PV power patterns. Therefore, in this paper, we explore different feature extraction mechanisms and finally propose a hybrid model that prioritizes temporal features first followed by spatial features for PV power forecasting. Our proposed model was evaluated on four publicly available PV power generation datasets for an hour-ahead forecasting. The experiments concluded that the proposed feature extraction mechanism achieved the lowest error rates when compared with state-of-the-art techniques. The contributions of the proposed model are summarized as follows:(1)A novel framework is proposed for the MG to accurately forecast an hour-ahead power generation to effectively manage the energy distribution between the consumers and suppliers. Next, a comparative study is conducted over different deep learning models for efficient feature extraction mechanisms, and finally, a hybrid GRU-CNN network is proposed.(2)The mainstream methods first learn the spatial and then temporal features that degrade the overall performance for complex nonlinear PV power patterns. Herein, the temporal features are prioritized over spatial features to efficiently learn the long-range complex non-linear PV power patterns for an hour-ahead PV power forecasting. The proposed model learns temporal dependencies using a multilayered GRU sequential deep model and spatial patterns using convolutional features, thus making our proposed model robust and generalized for an hour-ahead PV power forecasting.(3)To validate the performance of the proposed model, standard TS performance metrics such as mean square error, mean absolute error, root mean square error, and mean bias error are used to compared it with existing state-of-the-art methods over benchmark datasets. Our experimental results achieve the lowest error rates compared to other state-of-the-art methods.

For efficient PV forecasting, different researchers have used different techniques, for example, in the early literature, researchers used shallow ANN, which achieved promising results when compared with the traditional techniques. For instance, Almonacid et al. [47] used multilayer perceptron (MLP) to predict the PV power generation. Similarly, Dahmani et al. [48] used the forward propagation MLP model on the global solar radiation forecasting at a certain tilted angle for five-minute resolution. Another group of researchers [49] proposed a neural network with one hidden layer (extreme learning machine) for intermittent prediction. The authors claim that when there are large numbers of hidden layers in the network, it creates problems such as overfitting and gradient vanishing [50]. To solve these problems, researchers developed different techniques and finally, in 2006, introduced the Deep Belief Network (DBN) [51]. With the recent improvement of deep learning techniques in PV power forecasting, Kuremoto et al. [52] used DBN with a restricted Boltzmann machine (RBM) for TS forecasting. Similarly, Dalto et al. [53] investigated the performance of the deep and shallow networks for ultra-short-term wind prediction. The authors claimed that the computational complexity of the model is reduced by carefully selecting the input variables using their proposed variable selection algorithm. Wan et al. [54] used DBN with RBM for day-ahead wind speed prediction. They used 144 input and output nodes each in their regression model. The experimental results concluded that their model outperformed when compared with the support vector regression, single-hidden, and three-layer ANN. However, for efficient wind and PV power forecasting, their performance is affected by many variables; therefore, training DBN layers by layers requires extensive training and the model gets stuck in the local minimum. To tackle these problems, researchers introduced CNN architectures, which share features locally and globally to reduce the computational complexity and extract meaningful patterns from complex TS data. In this direction, different techniques are reported in the literature, for instance, Diaz-Vico et al. [55] used CNN for wind and solar irradiation prediction. Wang et al. [56] used ensemble techniques for wind power forecasting. Similarly, Wang et al. [29] used PV power forecasting using CNN. Sezer and Ozbayoglu [57] used the CNN model and changed the input format from 2D to 1-D for TS data. Usually, CNN is suitable to extract and learn spatial features from the input data; however, temporal features also play a key role in TS PV power prediction. Therefore, researchers used the LSTM model for long-range temporal dependencies, for example, Qing and Niu [58] used meteorological and weather data as input to the LSTM model for solar irradiance prediction. Recently, researchers concluded that integrating CNN with the LSTM model overcomes the shortcoming of a single model, as it utilizes the advantages of multiple models to jointly learn the spatial and temporal information for accurate and complex PV forecasting. Hybrid models are also introduced in the TS prediction domain, for example, Liu et al. [59] used wavelet transform followed by CNN to extract low-frequency information, while LSTM is used for high-frequency information extraction. Qin et al. [60] used the CNN model for spatial feature extraction while the temporal features were extracted by the LSTM model.

To reduce the energy crises and limit the harmfulness of climatic changes, researchers proposed different techniques as mentioned above to integrate PV power forecasting into their existing power generation systems. The existing traditional methods employ structural and parameter adjustments of the forecasting model. Their performance is better for traditional forecasting tasks. However, due to the extremely unsteady nature of the PV power, especially on cloudy and rainy days [61, 62], their performance is extremely degraded. In the literature, most researchers claim that for accurate PV power forecasting, both spatial and temporal features are important [63, 64]. The existing standalone network of deep learning paradigms is only capable of exploring spatial or temporal features. To address these challenges, researchers are developing hybrid networks that have the potential to learn spatial and temporal features at the same time. However, in the context of PV power forecasting, hybrid networks are developed in the literature without focusing on the discriminative features of spatial and temporal ordering. Therefore, in this paper, we have comprehensively analyzed different feature extraction mechanisms by using a hybrid model. Our experiments concluded that learning temporal features by GRU followed by spatial features by CNN has much more efficient and effective pattern representation and learning potential, thereby achieving the highest accuracy and greatly reducing the error rates as compared to state-of-the-art methods.

3. Proposed Methodology

This section briefly discusses the overall flow of the proposed framework, where power from the main grid flows through the MG towards the end users, as visualized in Figure 1. In this research, we have developed an intelligent and robust hybrid deep learning inspired model, which mainly consists of three steps: processing; model training; and its evaluation. In the preprocessing step, outliers and abnormalities are removed from the data, while in the second stage, a training procedure is applied on various machine and deep learning models. In the third stage, the final PV forecasting is computed and evaluated using different error metrics. All these steps of the proposed method are discussed in subsequent sections.

3.1. Preprocessing

A recent study shows that the performance of the deep learning model highly depends on the input data [45]. Therefore, the PV power data is refined in terms of filling missing values, removing outliers, standardization, and normalization, then the proposed deep learning model efficiently extracts the meaningful patterns more conveniently. The existing PV power data is obtained from the solar panel in a raw format that is incomplete and unorganized [42]. It contained abnormalities because of sensors’ faults, bad weather conditions, and variable customer consumptions. Feeding these data directly to the deep learning model degraded the overall prediction [40]. Therefore, the input data is fed to the preprocessing stage to fill in missing values by taking the mean of the next and previous values. Then the data is normalized, and outliers are removed via the min-max and standard deviation methods, respectively.

3.2. Temporal Feature Extraction

To capture long-range temporal dependencies in the complex PV power foresting data, most of the researchers used a recurrent neural network (RNN) that learns weights across the hidden layers of the network for long-range dependencies in TS data [65]. The intermediate layers of the RNN preserve meaningful information from the previous state. The visual representation of the internal structure of RNN is shown in Figure 2(a), where the input and output are represented by and at time , similarly, the output of the single hidden layer at time is represented by , where represents the weight metrics. Figure 2(a) can be mathematically represented as in equation (1).

In equation (1), the terms , , and are used to represent the nonlinear activation and bias terms, while the term refers to the learn weights when capturing temporal dependency in PV power forecasting. RNN suffers from the vanishing gradient problem when the time interval of the target output is long, therefore a special variant called GRU resolves the vanishing gradient problem, which has two structure-gated mechanisms such as reset and update. As a result, it is less complex than the LSTM model because it has fewer gates and require a small number of parameters during training [66]. Their visual representation is shown in Figure 2(b).

The mathematical representation of GRU is given in equations (2) to (5), the updated and reset gate is represented by and , similarly, the candidate activation and basis vectors are represented by and , , , respectively. The is the output of the current unit which is connected to the input of the next unit. Furthermore, is the input of the current unit, which is also the output of the previous units. The and represent the activation function while the input of the training data and their corresponding output are represented by and at a time stamp . The reset gate and update gate are represented by and .

3.3. Spatial Features Extraction

CNN has two main properties, such as local connection and weight sharing to process high-dimensional data and extract meaningful discriminative features. CNN mainly consists of convolutional layers, pooling layers, and fully connected layers. Convolution layers are the core layers that are responsible for extracting local features. The extracted features of the previous layer are multiplied with the convolutional kernel to form the output feature map . It contains convolution with multiple input feature maps; their mathematical representation is given in equation (6).

Here, the feature map of the input convolutional layers and are represented by , while the bias, kernel, and output of the convolutional layer are represented by , , and , respectively. A Relu activation function is used throughout the network and its mathematical representation is shown in equation (7)

The pooling layer is mainly responsible for reducing the dimensions of the features, also known as the downsampling layer. It has several variants, such as average, max-pooling, etc.

3.4. Network Architecture

The GRU module captures the long-range dependency, so it is capable of learning useful information from TS data using the memory cells. The nonsalient information is discarded by a memory gate called the forget gate. Their output is directly connected to the CNN module. In the proposed hybrid model, the GRU module consists of two layers. In the first and second layers, 32 and 64 cell sizes are used, followed by a two-layered CNN module having a kernel size of 3 and a filter size of 64 in each layer. For nonlinearity, a ReLU activation is used. A detailed summary of the proposed model is given in Table 1. The output features are then flattened and a fully connected layer with 16 numbers of neurons is applied. An MSE is used as a loss function when the model is successfully trained, and then we evaluated it on testing data.

4. Experimental Results and Discussion

In this section, we discussed the PV power datasets, evaluation metrics, and comparative analysis with state-of-the-art methods. The proposed model is implemented in the Python programming language and the Keras (2.3.1) with TensorFlow (1.14.0) deep learning framework. Windows 10 operating system with a GeForce RTX 2070 SUPER graphics card is used to speed up the training process and the complete details are given in Table 2.

4.1. Datasets Description

To assess the proposed method’s performance, we use four publicly available real-world PV power datasets such as DKASC-AS-1A, DKASC-AS-1B, DKASC-AS-2Eco, and DKASC-Yulara-SITE-3A gathered in DKASC, Alice Springs (AS), Australia [6769]. The DKASC-AS-1A dataset is taken from the 1A plant, which generates 10.5 kW from 2 × 30 solar panels, and their installation was completed on Thursday, January 8, 2009. Similarly, the DSKASC-ASA-1B dataset was collected from the 1B plant that generated 23.4 kW from 4 × 30 number of panels, and their installation was completed on Thursday, January 8, 2009. The overall details of each plant and collected data information are given in Table 3. All these datasets are recorded from active solar power generation plants at five-minute resolution with different power generation capabilities. It consists of different attributes, for example, power generation and meteorological elements such as wind speed, weather temperature, etc. For training purposes, these datasets are divided into 70% for training, 20% for validation, and 10% for testing.

4.2. Evaluation Metrics

The performance of the proposed model is evaluated on the four widely used forecasting metrics such as MSE, MAE, RMSE, and MBE, which are mathematically expressed in equations (8) to (11).

5. Experimental Results and Discussions

The performance of the proposed model is evaluated with several deep learning models such as LSTM, GRU, CNN-LSTM, CNN-GRU, LSTM-CNN, and finally, the proposed GRU-CNN model.

5.1. Detailed Comparative Analysis

To analyze the performance of the proposed model, we have used four real-world PV power datasets, and their details are given in Table 3. In the literature, there are two types of feature extraction; one refers to spatial or temporal features extraction, and the second is a hybrid model where the spatial or temporal features are prioritized, respectively. Table 4 shows one-hour ahead PV power forecasting of the different standalone and hybrid models. Here, the error rate such as MSE, MAE, RMSE, and MBE of the proposed hybrid model is comparatively lower than standalone models. A graphical comparison of the forecasting results of naïve (SVR), state-of-the-art (LSTM-CNN), and the proposed model is given in Figure 3. While the visual representation of the proposed model on each dataset is given in Figure 4. The results reveal that the performance of naïve forecasting methods is much worse than the state-of-the-art and our proposed method. As given in Figure 4, there is a narrow gap between actual and forecasted values by the proposed model. This gap is higher in state-of-the-art models and much higher in naïve forecasting models.

To summarize the Table 4 experiments, in the TS PV power forecasting, effective feature extraction highly correlates with the forecasting of the deep learning models. In our case, the temporal features are prioritized first and then reduced their dimensionality. Using 1D-CNN to extract spatial features is an effective approach for modelling complex PV power forecasting patterns. However, extracting temporal features using an LSTM model is not effective because it uses 3-layer structuring gates. Therefore, due to high-dimensional features, the final layers of LSTM are not able to recognize the complex patterns of PV power. While the GRU uses two layers of structure, its feature space is small as compared to LSTM; thereby, GRU requires fewer computations and achieves the highest accuracy. The performance of the GRU-CNN model on the four datasets concludes that the proposed model is more suitable to be deployed in real-world PV power forecasting at MG.

5.2. Quantitative Evaluation

In this section, the experimental results are discussed to compare the performance of our model with deep learning models. Table 5 shows the performance of the proposed model with existing state-of-the-art models, herein, the first part shows the results of the DKASC-AS-1B dataset when compared with existing state-of-the-art models. For instance, Wang et al. [45] used the 1D-CNN model and achieved 0.304 and 0.822 values for MAE and RMSE, respectively.

Similarly, a hybrid approach is also used where they extracted the spatial features with the help of CNN and then LSTM is used to learn the temporal information, achieving 0.294 and 0.693 values for MAE and RMSE, respectively. Furthermore, when they first extracted temporal information via LSTM, followed by spatial information, it achieved 0.221 and 0.621 values for MAE and RMSE, respectively. Therefore, in this direction, we further proposed different feature extraction mechanisms, and finally, our proposed model achieved 0.1727, 0.0298, 0.0923, and 0.0235 values for RMSE, MSE, MAE, and MBE, respectively. The second row of Table 5 represents the performance of the DKASC-AS-2Eco dataset compared with existing techniques. In baseline research [44], the author’s experiments on multilayer perceptron (MLP) achieved 1.0861 and 0.1995 values for RMSE and MBE, respectively. They also used RNN and reported 1.0581 RMSE and −0.1442 MBE. An LSTM and GRU network is also used for PV forecasting, and they have achieved 1.0382, −0.084, and 1.0351, 0.1206 values for RMSE and MBE, respectively. In the last model [44], the authors decomposed the power series task into subseries by employing wavelet packet decomposition and then used the LSTM model, achieving 0.2357 and 0.0067 values for RMSE and MBE, respectively. Our proposed model achieved superior performance of 0.1646, 0.0271, 0.1157, and −0.0641 for RMSE, MSE, MAE, and MBE, respectively, when compared to existing models. Finally, the performance of the proposed model is evaluated on the DKASC-Yulara-SITE-3A [70] dataset against state-of-the-art methods. Chen et al. [71] proposed a radiation coordinate classification called (RCC-LSTM) for solar forecasting. Their proposed method achieved 0.94 and 0.587 values for RMSE and MAE on the DKASC-Yulara-SITE-3A dataset, respectively. The proposed method achieved 0.1715, 0.0294, 0.1126, and 0.0099 values for RMSE, MSE, MAE, and MBE, respectively.

6. Conclusion

Accurate PV power forecasting plays an important role in avoiding penalties enforced by customers on various production companies, building trust in the energy markets, and is helpful in energy generation scheduling. Mainstream traditional and deep learning methods rely on simple features and only consider spatial or temporal features to inherent nonlinear patterns of PV power series. In the proposed framework, we have investigated different features extraction mechanisms and experimentally proved that the proposed temporal and spatial features extraction outperformed the existing state-of-the-art methods. Our proposed framework mainly consists of three steps. In the first step, preprocessing is applied to the input data to fill in the missing values and normalize the data. After normalization, the data is fed to the GRU-CNN model to first learn the temporal and then spatial features. Finally, the performance of the proposed model is evaluated against its rivals, advocating better prediction abilities with the lowest error rates and better generalization potential. In the future, we are planning to deploy the proposed model over resource-constrained devices of home appliances for energy management.

Data Availability

The codes and related materials can be downloaded from https://github.com/Altaf-hucn/Hybrid-Deep-Learning-Network-for-Photovoltaic-Power-Forecasting.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIT) (No. 2019M3F2A1073179).