Abstract
Accurate photovoltaic (PV) power forecasting is essential for the stable and reliable operation of PV power generation systems. Recently, various deep learning- (DL-) based forecasting models have been proposed for accurate forecasting, but newly built systems cannot benefit from them due to the absence of PV power data. Although zero-shot methods based on single site can be used for PV power forecasting, they suffer from performance degradation problems when the characteristics of the source data and target data are different. To address this issue, we propose a novel zero-shot PV power forecasting scheme that leverages historical data from multiple PV generation systems at different sites. The proposed scheme constructs an individual forecasting model using historical data from each PV generation system. Then, two correlation coefficients are calculated for each forecasting model: one based on the correlation between the input variables of the source data and target data and the other on the correlation between the input variables and output variables of the source data. Lastly, the final forecasting value is calculated as a weighted sum of the predicted values of the constructed forecasting models for the input variables of the target data. In the extensive experiments for diverse DL models for forecasting, correlation coefficient types for weights, and data time intervals, the combination of recurrent neural network, Pearson’s correlation coefficient, and solar-noon time yielded the best prediction performance, with an improvement of up to 34.47% in mean absolute error and up to 15.94% in root mean square error compared to the best single-site zero-shot prediction. In addition, in experiments on PV power data from 9 cities in Korea using this combination, the proposed scheme achieved the best predictive performance in almost all cases and the second-best performance with a very narrow margin only in a few cases.
1. Introduction
As the global population increases from 7.8 billion in 2020 to 9.9 billion by 2050, global energy demand is expected to increase by about 50% [1]. If such energy demand is met by thermal power generation using fossil fuels, the problem of environmental pollution can become even more severe [2]. To address this issue, countries worldwide have signed the Paris Climate Agreement, aiming to keep the rise in the average global temperature well below 2°C above preindustrial levels and preferably limit the increase to 1.5°C. Since then, many efforts have been made to increase the share of renewable energy in power generation worldwide [3].
Renewable energy derived from natural sources replenished at a higher rate than consumed includes solar, wind, hydro, geothermal, and biomass energy [4]. These energy types have low environmental influence and reduce dependence on fossil fuels, leading to a more sustainable future [5]. Among them, solar energy has gained more attention and popularity for several reasons, such as no risk of energy source depletion, fewer restrictions on solar panel installation, and various panel capacities [6]. Thus far, many photovoltaic (PV) generation systems have been built, with an average annual growth rate of 48% [7]. Moreover, the recent price reduction of PV panels has made PV generation more cost-effective and accessible to more individuals and businesses [8].
However, weather dependence remains a significant challenge for PV generation. The performance of the PV panel and output of the PV generation system is greatly affected by weather conditions, such as cloud cover, rain, snow, and dust, reducing the stability and reliability of the PV generation system [9]. Accurate PV power forecasting is necessary to solve this problem [10].
Recently, various forecasting models based on machine learning (ML) and deep learning (DL) have performed better than traditional statistical models, such as autoregressive and moving average models [11, 12]. However, these models require vast historical data for training [13]. If the data are insufficient for training, the model may become too sensitive to noise in the training data, leading to overfitting. This cold-start problem can significantly degrade the forecasting performance [14], especially in newly built PV generation systems [15].
Zero-shot learning has been proposed to produce results, even without historical data [16]. However, there are very few cases where zero-shot learning has been applied to forecasting problems, especially in PV power forecasting [17]. Single-site-based zero-shot PV power forecasting (SZF) trains a model using source data and then makes predictions on target data without any fine-tuning. Here, source data and target data refer to a sufficient amount of data collected from an existing PV power generation system and data collected from a newly built PV power generation system, respectively. The most serious problem with this approach is that if the data distribution, pattern, and trend of the source data and the target data are different, the predictive performance may decrease significantly. To solve this problem, we propose a zero-shot PV power forecasting scheme that leverages historical data from multiple PV generation systems at different sites. In this scheme, an individual forecasting model is constructed using historical data from each PV generation system. Then, weights for each forecasting model are determined based on the correlation between the input variables of the source data and target data and the input variables and output variables of the source data. Lastly, the final prediction value is calculated as a weighted sum of the predicted values of the source forecasting models for the input variables of the target data.
In our experiments, we first tested different combinations of DL models, correlation coefficients, and data time options to find the best one and then used it across multiple data to evaluate the effectiveness of the proposed scheme. Experimental results showed that the combination of the recurrent neural network (RNN) model, Pearson’s correlation coefficient (PCC), and solar-noon time option performed best. Figure 1 illustrates the overall process of the proposed scheme.

The main contributions of this paper are as follows: (1)We proposed a novel zero-shot PV power forecasting scheme based on the DL model and correlation coefficient. To the best of our knowledge, this is the first effort to forecast the PV generation of newly built PV systems without historical PV power data(2)We analyzed the effects of one-dimensional (1D) and two-dimensional (2D) forms of time data to determine the most effective form for PV power forecasting(3)We conducted extensive experiments to find the most effective combination of the DL model, correlation coefficient, and time options for zero-shot PV power prediction(4)We showed that the proposed scheme can guarantee good prediction performance, unlike the SZF model. This can contribute to the effective and accurate evaluation of the feasibility of PV generation projects or construction
The paper is organized as follows. Section 2 introduces several studies on the ML- and DL-based PV power forecasting models. Section 3 details the proposed scheme, and Section 4 describes input variable configurations for constructing the forecasting model. Next, Section 5 discusses the experimental results of evaluating the proposed scheme’s performance. Finally, Section 6 summarizes this study and presents the conclusions.
2. Related Works
So far, many studies have been conducted to perform PV power forecasting efficiently. In particular, various ML and DL algorithms have been proposed recently to build more accurate PV power forecasting models. For instance, Pan et al. [18] proposed a forecasting model based on the support vector machine (SVM). In particular, they optimized the selection policies for the SVM hyperparameters by incorporating max–min ant colony optimization (ACO), differential evolutionary algorithm, and adaptive learning factor into the ACO algorithm. In addition, an ensemble filter algorithm was employed to remove abnormal data, and L2-norm regularization was applied to address standardization issues in the data. They demonstrated that the proposed model is significantly better than the comparative models and exhibits significantly improved performance in nighttime and peak power prediction. Meng and Song [19] proposed a daily PV power forecasting model based on random forest (RF). They constructed the RF model using data collected from the Zhonghe PV station in North China’s central region. They used PM 2.5 as an additional input variable due to the severe winter air pollution in North China. The proposed model categorizes winter days into three types based on climate characteristics and creates an RF model for each type. They demonstrated that the proposed model outperforms the SVM, elastic net, and gradient-boosting decision tree for nearly all error evaluation metrics. Li et al. [20] proposed a very short-term PV power forecasting model based on the RNN. The PV power data were divided into inter- and intraday data, and the RNN was used to discover nonlinear features and invariant structures. The proposed model was extensively compared with various ML and DL models using PV power data collected in Flanders, Belgium. The results indicated that the proposed model outperforms the comparative models (e.g., persistence and SVM) in 15 and 30 min forecasting horizons. Hossain and Mahmood [21] proposed a day-ahead PV power forecasting model based on long short-term memory (LSTM). They integrated the statistical knowledge from historical irradiance data and the sky forecast to create a synthetic weather forecast to use as input for the LSTM. The performance of the LSTM model was analyzed using a synthetic weather forecast and real sky forecast as input variables. They revealed that synthetic weather forecasts significantly improve accuracy by more than 30% compared to original weather forecasts. Liu et al. [22] constructed an ensemble model using SVM, multilayer perceptron (MLP), and multivariate adaptive regression spline (MARS) as stand-alone models to improve the weights of sub-stand-alone models through a recursive arithmetic average process. The proposed ensemble model generally performed better than stand-alone models using historical data from the Australian technology demonstration facility. Li et al. [23] proposed a hybrid DL approach for PV power forecasting that combines a convolutional neural network (CNN) with an LSTM model. The CNN structure extracts nonlinear features and invariant patterns in past PV output data, and the LSTM structure models the temporal changes in the recent PV power data to make predictions for the next time step. In the comprehensive evaluation using PV power data from Limburg, Belgium, the proposed model performed better than other comparative models with significantly less prediction error.
ML and DL models exhibit good performance in PV power forecasting. Table 1 summarizes some of the related studies on PV power forecasting using ML and DL models. However, these models cannot be used for newly built PV generation systems because they require data covering at least one year (12 months) for training. One possible way to solve this problem is to train a model using data collected from an existing PV power system and use the trained model on a newly built PV power system. However, if the two sites have different characteristics for PV generation, the predictive performance may be poor. This can be solved by effectively utilizing data collected from PV power systems at various sites.
3. Proposed Scheme
In this section, we first briefly explain the theoretical background of the DL model and correlation coefficient and then present the details of the proposed scheme. In general, state-of-the-art models such as transformer and temporal convolutional network-based models perform well in long-term time-series forecasting due to their strong long-term dependencies. However, in PV power forecasting, short-term dependencies play a more important role than long-term dependencies. Moreover, these models typically take a long training time due to complex structures or have low convergence stability. Hence, in this paper, we consider five popular artificial neural network (ANN) models: shallow neural network (SNN), deep neural network (DNN), RNN, LSTM, and gated recurrent units (GRU).
The correlation coefficient indicates the strength of the relationship between the two variables. If the product of the correlation coefficient between the input and output variables of the source data and between the input variables of the source data and the input variables of the target data is large, the model trained using the source data is more likely to be effective on the target data as well. In this study, the weights of forecasting models trained using different source data are determined based on the correlation coefficients so that the prediction values of forecasting models with higher correlations can be more reflected in the final prediction value. Among various correlation coefficients, PCC best expresses the relationship between two continuous variables. However, PCC can only detect linear relationships. Therefore, in addition to PCC, we also consider the distance correlation coefficient (DCC) to detect nonlinear relationships.
3.1. Deep Learning Models
Among the aforementioned models, the SNN usually refers to an ANN with only one hidden layer [24]. It is called shallow because it does not have multiple layers to learn representations from the input data. In contrast, the DNN refers to an ANN with multiple hidden layers, usually more than two [25]. DNNs are designed to learn complex representations from the input data by leveraging the depth of the network. Each layer of a DNN is trained independently and learns to recognize different features from the input data. Due to its simplicity and computational efficiency, the SNN is primarily used to solve simple problems, while the DNN solves more complex problems that the SNN cannot solve.
Unlike general ANN structures, the RNN, LSTM, and GRU are specifically designed to handle sequence data. These models primarily process sequence data, such as time series, text, and speech. RNN has a loop structure, allowing the retainment of information from previous steps [26]. This feature enables the RNN to process data sequences by considering the context from previous time steps. The RNN is used for various applications, including sentiment analysis, language translation, and speech recognition. However, the RNN has difficulty retaining information over long sequences due to the vanishing gradient problem. To address this problem, the LSTM model was introduced as a variant of the RNN [27]. The LSTM has a unique structure called memory cells that enable the LSTM to retain information over long sequences. The memory cells are controlled by gates determining when information should be stored, forgotten, or used for predictions. This structure enables the LSTM to handle sequence data effectively. However, the complexity of its structure and the number of parameters can make it computationally inefficient. The GRU approach was introduced as a simpler alternative to the LSTM. The GRU model has a structure similar to the LSTM but has fewer parameters and, therefore, is computationally more efficient [28]. The GRU method uses two gates, the update and reset gates, to control the flow of information in the network. The update gate determines how much past information should be retained, whereas the reset gate determines how much past information should be forgotten. Figure 2 illustrates a simplified structure of the ANN models.

3.2. Correlation Coefficient
An input variable, also known as an independent variable, is used to predict the value of an output variable. Understanding the relationship between input and output variables is essential for accurate forecasting models, enabling accurate forecasting of the future by correctly selecting the appropriate input variables and model structure. Correlation coefficients can be used to determine the relationship between an input and output variable. The most commonly used correlation coefficient is PCC [29]. It measures the strength and direction of the linear association between two variables, with values ranging from -1 to 1. The PCC is defined by Equation (1), where is the number of observations, and denote the values at time , and and represent the mean values of and , respectively.
The PCC is calculated as the covariance of the two variables divided by the product of their standard deviations. Covariance measures how two variables change together, whereas the standard deviation measures the spread of a single variable. A positive PCC indicates that as the value of the input variable increases, the value of the output variable also increases. A negative PCC indicates that as the value of the input variable increases, the value of the output variable decreases. A PCC of 0 indicates no linear relationship between the two variables. The PCC can detect a linear relationship between two variables well, but many nonlinear relationships exist in actual data.
Unlike the PCC, the distance correlation coefficient (DCC) can capture nonlinear relationships between variables [30]. The DCC is the square root of the distance covariance between two variables. The distance covariance measures the combined variability of two variables, considering the linear and nonlinear dependence between the variables. The DCC ranges from 0 to 1, with values close to 0 indicating independence between the two variables and values close to 1 indicating strong dependence. The DCC is defined by the following equations.
In the equations, denotes the distance covariance, denotes the distance variance, and represents the Euclidean distance between and , respectively.
The PCC and DCC have strengths and weaknesses. The choice between them depends on the specific analysis needs and relationship characteristics between variables. It is best to consider both methods and choose the one that best fits the data and research question.
3.3. Forecasting Process
This section describes the forecasting process of the proposed scheme in more detail. Algorithm 1 shows the process of calculating the final forecasting value.
|
A brief description of the algorithm is as follows. (i)Lines 4-7 construct a source forecasting model for each site and calculate its predictive value for the target data(ii)Lines 8-10 first calculate the two correlation coefficients for each site and then compute their product(iii)Lines 11-14 calculate the weight of each site based on the correlation coefficient. This is done by summing the correlation coefficients of all sites and dividing each site’s correlation coefficient by this sum. This makes the sum of the correlation coefficients for all sites 1(iv)Line 15 calculates the final forecasting value based on the predicted values of all forecasting models for the target data and their weights
4. Input Variable Configuration
For the experiments in this study, we used public PV power data from the Korean Open Data Portal (KODP). The KODP provides 1 h resolution PV power data collected from PV systems in various locations. To demonstrate the robustness of the proposed scheme, we considered PV generation systems in regions with diverse characteristics, such as inland, coastal, and island regions in Korea. Hence, we collected data from nine regions from January 1, 2017, to December 31, 2019. The statistics of the collected PV power data are summarized in Table 2.
From the table, we can see that the capacities of the PV power systems vary greatly. In order to utilize such data in the proposed method, it is necessary to scale it according to the capacity of the power system. This is because if the scale of the source power generation system of the data used for training and the scale of the newly constructed power generation system are significantly different, the predicted value of the forecasting model trained with the source data is not suitable for the new system due to the difference in scale. For this purpose, we performed scaling that divides the PV power data by the capacity of the corresponding PV power system.
The PV generation is directly affected by solar radiation that reaches the PV panel surface. The actual amount of solar radiation reaching the panel is influenced by various weather data, such as cloud cover, humidity, and atmospheric conditions. Therefore, we considered time and weather data for input variables in this study. The time data determine the maximum solar radiation on the PV panel.
In addition, we use month, day, and hour to represent time information. These time data were recorded as 1D data, making it difficult to reflect their periodic nature in ML and DL algorithms [31]. To address this issue, we transformed the 1D time data into 2D data using periodic functions and used them together. Additionally, we apply a proper time delay to the time data to improve the prediction performance. This can be an effective preprocessing technique for forecasting models because it can facilitate capturing the relationship between the input and output variables more accurately. Equations (5) and (6) represent periodic functions to transform 1D time data into 2D time data.
Here, and represent data period and time delay, respectively. For example, if represents the month, would be 12, and if represents the hour, it would be 24. We calculated the PCC between time and PV power data to determine the optimal time delay. Figure 3 shows a heatmap representing the PCC between time delay and PV power data. The upper heatmap in the figure displays the PCC between the monthly time delay and PV power data, whereas the lower heatmap displays the PCC between the hourly time delay and PV power data. The -axis of the heatmap represents the , , and from top to bottom, whereas the -axis represents the amount of .

For 1D data, PCC showed the largest absolute value when a time delay of 2 months was applied to month data and when a time delay of 18 h was applied to hour data. In the case of 2D data, the PCC showed the largest absolute value when a 5-month delay was applied to month data and a 13-hour delay was applied to hour data. We also investigated the time delay at which the sum of the absolute values of the PCC of each 2D time data was maximum and found that a 4-month delay for month data and a 4-hour delay for hour data achieved the largest sum. Table 3 summarizes the regression statistics for various time data settings, where bold values indicate the best performance for each regression statistic.
The table shows that using 2D time data is more effective than using 1D time data, and using both 1D and 2D time data is most effective. In addition, using the time delay that achieved the highest absolute value of the PCC for 1D and 2D data achieves better performance than other settings.
As mentioned earlier, weather conditions are closely related to PV power generation. Hence, we collected weather data from the Korea Meteorological Administration (KMA), which provides weather data at various resolutions. To construct a forecasting model, we used nine types of 1 h resolution weather data related to PV power generation, including temperature, precipitation, relative humidity, sunshine hours, and solar radiation. Thus, we configured nine datasets consisting of 18 input variables and one output variable. Input variables consist of nine types of time data and nine types of weather data. Figure 4 represents a correlation heatmap of the datasets. The figure shows that for PV power data, hour, , sunshine hours, and solar radiation have a strong positive correlation, and relative humidity has a strong negative correlation.

5. Experimental Results
As metrics to evaluate the performance of the forecasting model, we used the mean absolute error (MAE) and root mean square error (RMSE), defined by Equations (7) and (8), respectively. Here, , , and represent data amount, actual data, and forecasted data, respectively.
In the first experiment, we determined three important types: DL model type, correlation coefficient type, and data time type. For this purpose, we considered all possible combinations of these three types. Table 4 shows the hyperparameters used in each DL model.
In particular, the window size for RNN-type DL models (RNN, LSTM, and GRU) in the table specifies how much historical data should be used for prediction. On the other hand, SNN and DNN do not have this hyperparameter, so they cannot consider historical data in their predictions. Therefore, to ensure a fair comparison of prediction performance, the data used for SNN and DNN training were configured to include past data. After all, all models have 18 input variables but reflect all data from the past 24 points; that is, 432 input data are used in the model to generate one output. For instance, Table 5 presents the MAE comparison of all other combinations under the LSTM model.
In the table, the first column shows the type information of the proposed model. For instance, indicates that LSTM, PCC, and all time are used as the type of DL model, correlation coefficient, and data time, respectively. In addition, values in bold indicate the best performance for each location. In the case of Busan, no bold values are listed because SZF performs better than the proposed scheme.
In the experiment, we analyzed all cases that showed the best results. Figure 5 shows a bubble chart of the number of best performing cases according to data time and correlation coefficient. From the figure, we can see that PCC is better than DCC in terms of correlation coefficient and noon time data is the best in terms of data time. Hence, we used PCC and solar-noon time to implement the proposed scheme using various DL models. Tables 6 and 7 present the MAE and RMSE values of five different DL models for each location, respectively. Here, the bold values indicate the best performance for each location.

The tables show that the proposed scheme achieves the best prediction performance when using RNN. On the other hand, DNN and GRU models showed the next best performance. To prove the effectiveness of the proposed scheme, we compared its performance with RNN-based SZFs trained using data from other areas. Tables 8 and 9 present the MAE and RMSE values, respectively. Values in bold indicate the best performance, and values with an underline indicate the second-best performance for each location.
The proposed scheme performed best in most cases. We calculated how much the performance of the proposed scheme improved compared to the best performing SZF. Compared to the best SZF, the proposed scheme improved MAE performance by an average of 25.3% and RMSE performance by an average of 4.7%. In the next experiment, to prove the robustness of the proposed scheme for predictive models, we compared the performance of all comparative DL models and the proposed scheme. Table 10 shows the MAE of the best SZF for each model and the proposed scheme, and the improvement rate achieved by the proposed scheme. Figure 6 presents radar graphs for the MAE of the best SZF and the proposed scheme. In the figure, the smaller the radar size, the better the predictive performance. In summary, the proposed scheme performs better than the best SZF in almost all cases. Figure 7 presents the MAE and RMSE improvement compared to the best for each DL model type. We can see that the proposed scheme outperforms the best SZF in almost all cases. In only a few cases where the source and target data have very similar patterns, SZF gave slightly better forecasting performance than the proposed scheme. However, since we have to make predictions without knowing what pattern the target data will have, this is impractical for use in a real-world environment. Additionally, in terms of model construction time, the proposed scheme takes the time to construct a single SZF multiplied by the number of source data. However, if real-time updates to the model are not required, prediction accuracy is much more important than model construction time. Hence, the proposed scheme is more desirable for newly built PV power systems than SZF.


6. Conclusion
In this paper, we proposed a novel zero-shot PV power forecasting scheme based on DL model and correlation coefficient. We constructed an individual DL-based forecasting model for multiple locations and then calculated the final prediction value through the correlation coefficient-based weighted sum of the predicted values of these models. By using this approach, the proposed scheme can always guarantee good prediction performance even in newly built PV generation systems.
To evaluate the effectiveness and efficiency of the proposed scheme, we performed extensive experiments using historical data collected from nine source sites in Korea. We first compared the performance of the proposed schemes using different combinations of DL model, correlation coefficient, and data time option to find the best one. We considered five DL models (SNN, DNN, RNN, LSTM, and GRU) for forecasting model and two correlation coefficients (PCC and DCC) and three data time types (all time, day time, and solar-noon time) for weights. The experimental results revealed that the forecasting performance is best for the combination of RNN, PCC, and solar-noon data time data. The proposed scheme using this combination achieved an average MAE of 0.038 and an average RMSE of 0.072.
In addition, we compared the performance of the proposed scheme with SZF models trained using different datasets. We confirmed that the proposed scheme achieved the best predictive performance in almost all cases and the second-best performance with a very narrow margin only in a few cases. More specifically, it improved MAE performance by an average of 25.3% and RMSE performance by an average of 4.7% compared to the best performing SZF model. Although SZF showed the best performance in some cases, it is difficult to use in real-world environment because its performance varies greatly depending on the data used for training.
In the future, based on the forecasted value derived through the zero-shot PV power forecasting model and the electrical load data of the building or cluster where the PV system will be installed, we plan to research the optimal capacity of an energy storage system to be integrated with this grid. In addition, we intend to develop an energy operation scheduling algorithm that can guarantee the most economic benefits to users by analyzing how energy is most economically operated in a PV–energy storage system-integrated grid with optimal capacity.
Data Availability
The data that support the findings of this study are available from “https://www.data.go.kr/index.do” and “https://data.kma.go.kr/cmmn/main.do.”
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this article.
Acknowledgments
This research was supported by the Energy Cloud R&D Program (grant number: 2019M3F2A1073184) through the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT.