Abstract

Accurate PV power forecasting is becoming a mandatory task to integrate the PV system into the power grid, schedule it, and ensure the safety of the power grid. In this paper, a novel model for PV power prediction using AP-LSTNet has been proposed. It consists of a combination of affinity propagation clustering and long-term and short-term time series network models. First, the affinity propagation algorithm is used to divide the regionally distributed photovoltaic station clusters into different seasons. The Pearson correlation coefficient is used to determine the strong correlation between meteorological factors of photovoltaic power, and the bilinear interpolation method is used to encrypt the meteorological data of the corresponding photovoltaic station cluster. Furthermore, LSTNet is used to mine the long-term and short-term temporal and spatial dependence of photovoltaic power, and meteorological factor series and linear components of auto-regression are superimposed to realize the simultaneous prediction of multiple photovoltaic stations in the group. Finally, PV power plants in five cities, Wuwei, Jinchang, Zhangye, Jiuquan, and Jiayuguan in the Hexi region of Gansu Province, China, will be selected to test the proposed model. The experimental comparison shows that the prediction model achieves high prediction accuracy and robustness.

1. Introduction

In recent years, PV power generation has developed rapidly, relying on the advantages of clean energy. In China, PV power plants include centralized and distributed two categories; centralized power plants are large-scale PV power plant clusters constructed in desert areas, making full use of abundant and relatively stable solar energy resources and accessing high-voltage transmission systems to supply long-distance loads. Distributed power stations are mainly built on the surface of scattered buildings, solving the power consumption problems of users in the vicinity, realizing the compensation of power supply difference, and sending out through grid connection. Self-generation and self-consumption surplus power on-grid is the main mode of distributed photovoltaic power generation, users of photovoltaic power stations issued by the power to meet their load first and the excess can be sold to the power company; if the power generated by the energy does not meet their load, the grid power supply to supplement it. The power company only collects daily power generation data but lacks monitoring of the operating conditions of the power generation equipment and is unable to carry out routine maintenance and repair. With the continuous improvement of the penetration rate of distributed PV, its volatility and randomness have become uncontrollable factors in grid-coordinated dispatch. Therefore, accurate short-term power forecasting of distributed PV is of great importance to optimize power system dispatching and ensure safe operation of the power system [1].

Short-term PV power forecasting methods can be divided into statistical methods and machine learning methods. Statistical methods mainly include grey theory [2], regression analysis [3], time series [4], and so on. These methods have simple models but relatively poor prediction accuracy and stability. Machine learning methods mainly include support vector machine [5], random forest [6], extreme learning machine, neural network [7], and so on. In recent years, deep neural networks such as convolutional neural networks (CNN) [8, 9] and long short-term memory (LSTM) networks [10, 11] have also been introduced to improve the fitting ability of models. Combining the advantages of different neural networks, hybrid network prediction models such as CNN-LSTM [12, 13] and recurrent neural network (RNN)-LSTM [14, 15] further improve the prediction accuracy of the model by capturing the time series and spatial correlation between PV power generation power sequence and related influencing factors. Since the attention mechanism can measure the importance of input features, it is introduced into various neural network units to improve the generalization ability of the model [16, 17].

The above power prediction method is mainly applicable to PV power stations with meteorological acquisition systems and complete power information, while the distributed PV power stations with low-voltage access are small and scattered, and the investment cost of separately configuring meteorological measurement equipment is too high [18]. In [19, 20], by analyzing the spatial correlation of PV power between the neighboring power station and the power station to be predicted, the decision tree and neural network model are, respectively, established to construct the nonlinear mapping relationship between them. In [21, 22], the BP neural network model is established to improve the PV prediction accuracy of the slave station by studying the correlation between the “PV master station—PV slave station” clustered in the same category. In [23, 24], spatial correlation cluster analysis is performed on historical sample groups of weather types, the deployment of weather stations is optimized, and a multi-PV user power prediction model based on “space-time correlation” is proposed. In [25, 26], missing power data are reconstructed based on space-time correlation, and an irradiance encryption model is established by using a 3D convolutional neural network to achieve full grid coverage of power and meteorological data. In [27], an improved version of the PV2-state model is introduced for intra-hour PV power prediction. Reference [28] proposes a new model for predicting photovoltaic power generation using LSTM-TCN. In [29], LSTM-based predictive models are used to control solar PV systems and effectively prepare for future battery system consumption. Reference [30] proposes a new two-stage deep learning method for photovoltaic power generation prediction, which has significant improvement and robustness in point prediction and probabilistic prediction tasks. In [31], a graph-based multisite daytime PV generation prediction model is presented. It is possible to interpret which PV stations and time steps influence the prediction. Reference [32] analyses the performance of 12 different models that predict day-ahead generation in line with market conditions.

The current short-term photovoltaic power prediction technology has achieved certain results, but it still needs to solve the following aspects: (1) all of the above solutions effectively improve the power prediction accuracy of distributed PV, but the high economic cost makes it impossible to carry out large-scale engineering practice. (2) The power output of distributed PV has strong randomness and volatility, and has obvious daily periodicity [33]. The power prediction of a single distributed PV power plant not only has low accuracy but also has little impact on power system planning and scheduling. The model adaptability between different power plants is poor and cannot be shared.

The traditional single model cannot effectively solve the above problems. Regarding the photovoltaic power prediction technology for the multivariable time series prediction scenario, this paper proposes a short-term power forecasting model for regional distributed PV power plants based on affinity propagation (AP) clustering and a long short-term time series network (LSTNet). The regional forecasting of multiple distributed PV power plants can effectively improve the accuracy and applicability. According to the local climate characteristics of the PV power plant, the PV output data are identified in seasons, and the distributed PV power plant groups in different seasons in the region are divided by AP clustering so that the weather in each power plant group is consistent. Due to the geographic dispersion of distributed PV power stations, it is impossible to set up a meteorological station for each station and only the meteorological data of centralized PV power stations that are far away can be shared. Therefore, the Pearson correlation coefficient is first used to determine the strong correlation between meteorological factors of PV output power and then the corresponding meteorological data are encrypted by bilinear interpolation so that each station group has its meteorological data, thus realizing the full-area coverage of key meteorological data. Finally, the LSTNet is used to forecast multiple PV power plants in the cluster simultaneously to improve the short-term power forecast accuracy of the large-scale distributed PV power plant cluster. Combined with the measured data, the results show that the power prediction model has excellent generalization, high accuracy, and strong robustness.

2. Distributed PV Power Plant Cluster Partition Based on AP Clustering

2.1. Correlation between PV Power and Meteorological Factors

PV performance is related to many factors, such as external meteorological factors like light intensity, temperature, humidity, cloud movement, wind speed and direction, etc. [34], and internal technical parameters such as installed location and capacity, PV conversion efficiency, and PV panels. However, large-scale distributed PV users not only seriously lack technical parameters such as installation location but also do not provide public weather stations information, such as illumination and other meteorological data, and cannot obtain strict geographical zoning information in accordance with meteorological associations. Therefore, it is necessary to find the space-time distribution rule equivalent irradiance, which describes the nature of PV power from the existing PV data.

As shown in Figure 1, since there is an approximately linear relationship between the power plant output power and the solar irradiance, the PV power curve of the large-scale distributed PV power plant can be used as the equivalent irradiance to reflect the change process of meteorological information such as light intensity and temperature of the location. According to the spatial correlation of the historical data of PV power, the PV power plant cluster is divided so that the data characteristics of each power plant in the cluster are similar and have meteorological consistency. On the one hand, this provides a basis for the subsequent reconstruction of regional meteorological data as the meteorological data of each distributed power plant. On the other hand, regional forecasting of distributed PV power plants can simplify the tedious modelling of each individual station and reduce the difficulty of forecasting.

2.2. Classification of PV Power Station Group by AP Clustering

AP clustering is an unsupervised clustering algorithm for clustering based on information transfer [35, 36]. Compared with traditional clustering algorithms such as K-means clustering, this algorithm is insensitive to outliers, and the clustering results are more stable [37, 38]. In this paper, AP clustering is used to cluster the PV power of PV power plants in the region, and power plants with the same fluctuation trend are divided into the same group of power plants. The algorithm first takes all distributed PV power stations as potential cluster centers, and then iteratively competes for cluster centers based on “mutual information transmission” among power stations. As shown in Figure 2, there are two information exchange mechanisms between the power plant data points: attraction information r (i, k) and assignment information a (i, k), where the attraction information r (i, k) represents the degree to which plant k is suitable as the cluster center of the candidate central plant i. The assignment information a (i, k) represents the suitability of plant i to select plant k as the cluster center.

As an unsupervised algorithm, AP clustering cannot directly evaluate the clustering effect. In order to obtain accurate and stable clustering results, this paper chooses the silhouette coefficient [39, 40] to evaluate the clustering results, and its formula is as follows:where s (i) is the profile coefficient of power plant i; m (i) is the average distance between sample of power plant i and other samples in the same cluster, which is called cohesion; n (i) is the average distance between and all samples in other clusters, which is called separation degree; the average contour coefficient is the average value of all sample contour coefficients and the value range is [−1, 1]. The larger the value, the smaller the intracluster distance, and the larger the intercluster distance, the better the clustering effect.

The steps for dividing PV power plant groups by AP clustering are as follows:Step 1: Divide the PV power data into four seasons and standardize each season.Step 2: Calculate the similarity s (i, k) between the power samples of the distributed PV power plants as shown in (2) to obtain the similarity matrix S, and its diagonal element s (k, k) is the evaluation standard for whether the power plant k sample can become the cluster center of the power plant cluster, which is called the reference value, and its size will affect the number of clusters.Where means to find the Euclidean distance.Step 3: Calculate the element r (i, k) of the attractiveness matrix r and the element a (i, k) of the attribution matrix a.Step 4: Update r (i, k) and a (i, k), and introduce a damping coefficient λ to adjust the convergence speed and the iterative stability.Where d represents the number of iterations.Step 5: If the number of iterations exceeds the preset number or the clustering distance is not changed, go to step 6. Otherwise, repeat steps 3 and 4 to continue the calculation.Step 6: Calculate the profile coefficient according to the clustering result under the current reference value, determine the clustering center and different PV power plant clusters, go to step 2 and change the reference value.Step 7: Analyze the contour coefficients under different clustering numbers, select the best clustering result, and complete the division of the PV power station group.

3. Selection and Encryption of Meteorological Factors

Considering that after the PV power plant group is divided, the meteorological consistency exists in the region of the same PV power plant group, the most relevant cluster center power plant of each type of PV power plant group can be selected as the representative power plant, and the meteorological data of the representative power plant is used as the overall meteorological data of the region.

Since most distributed PV power plants only have meteorological data from central PV power plants located far away and coarse-grained weather forecast, the bilinear interpolation method can be used to encrypt meteorological data of representative power plants of different station groups when there is no corresponding meteorological station for representative power plants, so that each station group has its meteorological data. In this way, the coverage of key meteorological data for the whole region by station groups can be achieved. Therefore, this paper first uses the Pearson correlation coefficient to determine the strong correlation between meteorological factors of PV output power, and then uses the bilinear interpolation method to encrypt the corresponding meteorological data for the subsequent prediction of neural network models.

3.1. Selection of Meteorological Factors

Assuming that the internal technical parameters of the PV power plant remain basically unchanged, it is very important to isolate the main factors influencing the PV output power from the external meteorological factors (irradiance, temperature, humidity, etc.). The Pearson correlation coefficient R (X, Y) is used to measure the degree of correlation between the PV output power and each influencing factor, namely.where and are the output power and meteorological factor sample points of the standardized PV plant sample c, respectively. X and Y are matrices consisting of and , respectively. and are the mean values of and , respectively. C is the number of samples. The value range of the correlation coefficient is [−1, 1], and the larger the absolute value, the stronger the correlation.

3.2. Densification of Meteorological Factors Based on Bilinear Interpolation

Bilinear interpolation, as a classical statistical encryption method, is widely used in signal processing, digital image processing, and so on [41]. The core idea is to perform a linear interpolation in two directions, respectively. The basic principle is shown in Figure 3. In weather data encryption, the two directions of the interpolation function are the east-west and north-south distances between weather stations, as shown on the x and y axes. P is the meteorological station to be sought, and , , , and are the four known meteorological stations closest to them. and obtain the meteorological data of the point by linear interpolation in the x-axis direction, as shown in (8), and similarly obtain the meteorological data of the point . By interpolation in the y-axis direction from points and , the meteorological data f (P) of point P can be obtained, as shown in equation (9):where and are the abscissa and ordinate of weather station , respectively. and are the abscissa and ordinate of weather station , respectively. and are the abscissa and ordinate of point P, respectively.

4. Prediction Model Based on LSTNet

The LSTNet model [42, 43] needs to keep the input and output dimensions consistent, and it is impossible to input only meteorological data as the characteristic variable of multistation PV power. In this paper, the original LSTNet model is modified. The PV power and meteorological factors are taken as the input of the model. The output dimensions of the fully connected layer of the nonlinear branch and the autoregressive layer of the linear branch are modified to be the same as the quantity dimension of the PV power plant. The final prediction result is obtained by superimposing the prediction results of the two branches. In this way, the model can extract the meteorological characteristics without increasing the output dimension, so as to better extract the long-term trend and short-term fluctuation characteristics of the time series of multistation PV output and related meteorological factors, and combine the robustness of linear and nonlinear branch addition models to effectively predict the PV power generation output in the future. The structure of LSTNet is shown in Figure 4, which consists of a nonlinear branch composed of convolutional layers, loops, loop-skip layers, and fully connected layers, and a linear branch composed of autoregressive layers.

4.1. Convolution Layer

The first layer of LSTNet is a convolutional layer, which extracts short-term patterns and local dependencies between variables from time series of PV power and meteorological data, and mines correlations between multiple features.

The convolutional layer uses the convolutional kernel to traverse the input layer and complete the data transformation to the input layer. The output obtained by extracting the time series features by one-dimensional convolution iswhere is the input time series.  denotes convolution operation. and are the weight matrix and bias vector of the th filter, respectively. is the output feature. f (·) denotes the ReLU activation function.

4.2. Loop and Loop-Skip Layer

The output of the convolutional layer is input to both the recurrent layer and the loop-skip layer. The recurrent layer uses LSTM network units to selectively remember or forget sequence information, which can capture relatively long-term dependencies in historical information while reducing the risk of gradient explosion during model training. The state of the cycle unit at time t is calculated according to the following formula:where , , , , , and are input gate, forget gate, output gate, input node, memory unit, and hidden layer output, respectively. is the number of hidden units that are skipped, where p = 1. is the input at time t. , , , , , , , and are weight matrices. , , , and are the corresponding bias vectors. σ (·) is the Sigmoid function.

The PV power sequence has an obvious periodic pattern, while the LSTM network cannot capture the long-term repetitive pattern in the sequence, which can be solved by adding the loop-skip layer. In this paper, a recurrent RNN with cycle skipping is used to extend the time span of the information flow and simplify the optimization process of the model. In particular, the currently hidden unit is concatenated with hidden units of the same historical period in the adjacent time period, thus exploiting a periodic pattern that combines short- and long-term repetitive patterns of the partial sequence data. The cell state update procedure for the loop-skip layer is the same as (11), and the value of p (24N, N is a positive integer) can be easily determined for a PV power dataset with a clear periodic pattern.

4.3. Fully Connected Layer

The cycle unit state of the cycle layer at time t is connected to the fully connected layer, the cycle skip layer has multiple connections to the fully connected layer, and the number of connections is related to the number of data points with skip length p as the period in the time window. The fully connected layer integrates the outputs of the loop layer and the loop-skip layer to obtain the prediction result of the nonlinear branch.where and are weight matrices. b is the bias vector. is the output of the loop layer at time t. are the outputs of the cyclic skip layers at time instants t − p + 1 to t.

4.4. Autoregressive Layer

LSTNet adds a linear branch prediction module consisting of an autoregressive model that uses the values of previous time points of the same variable to predict the value of the current time point. The linear branch is predicted as follows:where is the prediction result of prediction autoregressive component l at time t. is the value matrix at each previous time point. is the size of the input window. and are autoregressive model coefficients.

Finally, the prediction results of the linear branch and the nonlinear branch are superimposed to obtain the final prediction result :where is the prediction result of the autoregressive model at time t.

For the PV power sequence, the value at a given time is highly dependent on the power value of previous time steps, and there is a nonperiodic change in a short time range with autocorrelation, as shown in Figure 5. The use of an autoregressive model can well capture the linear characteristics in the PV sequence and improve the prediction accuracy.

To sum up, the overall framework of the AP-LSTNet model proposed in this paper is shown in Figure 6.

5. Example Analysis

In this paper, measured power data and meteorological data of 48 distributed PV power plants in five cities, namely, Wuwei, Jinchang, Zhangye, Jiuquan, and Jiayuguan, in the Hexi region of Gansu Province, China, from January to December 2021 are selected for simulation experiments. The installed capacity of each PV system ranges from 30 MW to 200 MW.

5.1. Division of PV Power Plant Group

Due to the rotation of the earth, the illumination time and the angle of the sun will change with the seasons, and the power generation of the PV power station will change seasonally. Therefore, it is necessary to divide the group of stations into different seasons and establish the prediction model. Annual PV power data are used, first divided into four seasonal prediction units, the different clusters of distributed PV plants in each season are divided by the step of dividing the clusters of PV plants using AP clustering in Section 2.2, where the damping coefficient λ = 0.5, the maximum number of iterations is 200, and the initial value is the median of the nondiagonal elements in the matrix S. Figure 7 shows the four seasons of the standing group of classification results.

Obviously, the station group divided according to PV power characteristics basically belongs to the same geographical area, that is, the power station in this area can be equivalent to a small-scale PV power station group with meteorological consistency. There are some differences in the division results of the same station group in different seasons, so it is necessary to divide the station group into four seasons before modelling. The division results are shown in Table 1.

To verify the superiority of AP clustering over traditional clustering algorithms, taking spring as an example, different clustering algorithms are used to partition the PV power of 56 power plants in this season. The corresponding contour coefficients under different classifications K are shown in Table 2. Obviously, when the number of classes is the same, AP clustering has a better clustering effect and a larger contour coefficient.

5.2. Selection of Meteorological Factors and Encryption Results

Considering the distribution of meteorological stations and the results of station clustering in different seasons, the PV power station cluster located at 38.9∼39.5°N latitude and 96.8∼97.4°E longitude is selected as the typical station cluster in each season. The cluster results of nine power plants in the power plant group in four seasons all belong to the same power plant group, which has a strong correlation. The installed capacity of the stations ranges from 43 MW to 104 MW. A suitable weather station was selected as the representative weather station for the cluster center site. It can be seen from Figure 8 that there is no meteorological station that completely matches the cluster center station. First, the nearest weather station is selected to determine the strong correlation between meteorological factors of PV power and Pearson correlation coefficient. Then, the meteorological data of the four nearby weather stations are encoded by bilinear interpolation, and the encoded meteorological data of the representative power plant are used as the meteorological data of the whole area for the subsequent model prediction.

The Pearson correlation coefficient between the output power of a representative power plant of a typical distributed PV power plant cluster and meteorological factors in different seasons is shown in Table 3. It can be seen that although the Pearson correlation coefficients between meteorological factors and PV power in different seasons are different, ultraviolet index (UVI), global horizontal irradiance (GHI), direct normal irradiance (DNI), and PV power have strong correlations. Secondly, relative humidity, solar zenith angle, diffuse horizontal irradiance (DHI), and ambient temperature have some correlation, while dew point and pressure have little correlation. In this paper, four meteorological characteristics with high correlation, namely, UVI, GHI, relative humidity, and DNI, are selected as the main meteorological input parameters affecting PV power generation.

Encryption based on bilinear interpolation of meteorological factors of meteorological data operator site analysis of the meteorological condition validation is shown in Figure 9.

5.3. Short-Term Power Prediction Results
5.3.1. Evaluation Index

The robustness of the model was verified using three indicators: mean absolute error (MAE) , mean absolute percentage error (MAPE) , and degree of fit (RS) . The specific expression of evaluation indicators is as follows:where is the actual value of sample m (only the part greater than 0 is calculated in ). is the predicted value of sample m. is the average of the actual values. M is the number of samples.

5.3.2. Experimental Setup

Regarding the original LSTNet model, the search range of neurons and parameters of each layer are set, the set of values of the number of hidden neurons of the convolutional layer and the recurrent layer is {48, 64, 96, 128}, the set of values of the number of hidden neurons of the recurrent skip layer is {10, 20, 30, 40}, and the set of values of the regularization coefficient is {0.1, 1, 10}. The value set of the sliding window size is {48, 48 × 2, 48 × 3, 48 × 4, 48 × 5, 48 × 6, 48 × 7}, and the value set of the dropout layer coefficients is {0.1, 0.2}. The number of neurons in each layer is traversed using a grid search method within this threshold range; the model prediction loss corresponding to the number of neurons in each group is calculated, and the model parameter with the minimum loss is selected as the final number of neurons; the number of hidden neurons in the convolutional layer, the cyclic layer, and the loop-skip layer is set to 64, 64, and 20, respectively; the regularization coefficient of the autoregressive layer is 1, the sliding window size is 48 × 7, the skip length p is 48, and the coefficient of the dropout layer is 0.2. The experimental batch size is 64, the training rounds are 100, and the Adam algorithm [44, 45] is adopted as the optimization algorithm.

5.3.3. Performance Evaluation and Error Analysis of the Prediction Model
(1)Comparative analysis of single-station prediction and multistation simultaneous predictionTo verify the effectiveness of using AP clustering to predict the output power of multiple PV power plants in a typical cluster, one AP-LSTNet model and nine LSTNet models are built. The AP-LSTNet model simultaneously predicts nine PV power plants in the cluster, and its inputs are the PV power of nine power plants and the meteorological data of typical meteorological stations. The nine LSTNet models each predict the nine PV stations in the cluster, and the input is the PV power of each station and the meteorological data of typical meteorological stations. The prediction results of the two types of models are compared and analyzed, and the prediction results are shown in Table 4 and Figure 10.It can be seen that the multistation simultaneous prediction in spring, summer, autumn, and winter has lower error indexes and than that of single-station prediction, and the average training time is greatly reduced. It can be seen that the prediction accuracy and speed can be effectively improved by using AP clustering to divide the PV power plant group and simultaneously predict the power plants in the power plant group.(2)Comparative analysis of different prediction modelsTo verify the validity and reliability of the LSTNet model, the prediction results of the AP-LSTNet model were compared with those of the AP-LSTM model, the APCNN-LSTM model, the AP-ALSTM model with attention mechanism, and the AP-A-CNN-LSTM model. To ensure the objectivity and fairness of the control experiment, all models use AP clustering to divide the station group and predict the nine stations of the typical station group simultaneously. The parameter selection method of the comparison model is the same as that of the LSTNet model, and the grid search method is used to optimize the parameters of the comparison model.The experimental results are shown in Table 5 and Figure 11. The PV power prediction curves of power station 1 from 5:00 to 20:00 in spring, summer, autumn, and winter are randomly selected for display. Table 5 shows the average , , and of nine power stations in each season.

As shown in Table 5, AP-LSTNet has a lower prediction error than each comparison model in different seasons, and the of the model is above 96%. The accuracy of and of AP-A-LSTM and AP-A-CNN-LSTM models with attention mechanism is improved compared with the original model, indicating that the attention mechanism can improve the ability to extract key feature information. Compared with the suboptimal AP-A-LSTM model, the and of the AP-LSTNet model in four seasons are further reduced by more than 4%, and the is the optimal value in four seasons, indicating that the loop-skip layer in this model can extract the time series characteristics of ultra-long time series more effectively.

To further verify the prediction effectiveness of the AP-LSTNet model under different weather types, the prediction results and measured power generation data of six prediction models of typical days under different weather types of Power Plant 1 are shown in Table 6 and Figure 12. When comparing the prediction results under different weather types, AP-A-LSTM and AP-CNN-LSTM, the prediction error of the AP-LSTNet model is significantly lower than that of the other models under the four weather types. The prediction error of the model is the lowest in sunny weather with an of 0.62 MWh. The prediction error of the model is the highest in sunny to cloudy conditions, and the is 1.59 MWh. Compared to the contrast model, the model has a significant improvement in accuracy in cloudy and showery weather.

6. Conclusions

In this paper, a short-term PV power plant group forecasting model, the AP-LSTNet model, is proposed. According to the seasonal variation, four forecasting units are constructed to simultaneously predict the output power of multiple power plants in the group. Simulation results show that the prediction results of the AP-LSTNet model are closest to the actual output. The main conclusions are as follows:(1)AP clustering is used to divide multiple distributed PV power plants into small power plant clusters so that the meteorological data in each power plant cluster is consistent. The meteorological data are encrypted by the linear interpolation method, and the coverage of key meteorological data in the whole region is realized.(2)Meteorological features were added to the LSTNet model, and the short-term dependence and cycle repetition patterns between PV sequence and meteorological factors were considered. The nonlinear branches composed of different neural networks and linear branches composed of autoregressions were integrated to perform feature mining, and more accurate prediction results were obtained.(3)In the same PV power station group, the PV power of multiple stations has a strong coupling. Compared with single-station prediction, the LSTNet model can better learn the dependence of multistation PV output power, improve the accuracy of regional PV prediction, and reduce the training cost.

It is necessary to further improve the clustering algorithm to accurately partition the distributed PV by integrating geographical location information, meteorological characteristics, and other factors; improve the accuracy of regional PV power plant group prediction; and explore the use of high-precision data enhancement technology to solve the problems of distributed PV power data and lack of key weather elements at low cost.

Data Availability

The data that is used to support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

X.L. and G.Y. conceptualized the study, contributed to visualization, performed funding acquisition, performed supervision, contributed to validation, performed formal analysis, investigated the study, contributed to data curation, and performed project administration. X.L. and J.G. proposed the methodology. X.L. was in charge of software and writing of the original draft. X.L., G.Y., and J.G. wrote, reviewed, and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This research was funded by the Science and Technology Development Guiding Plan Project Lanzhou under Grant No. 2022-5-36; Lanzhou City Science and Technology Programme Project No. 2022-2-22; 2023 Lanzhou Resources & Environment Voc-Tech University Research Capacity Enhancement Project No. X2023A-13; Lanzhou Resources & Environment Voc-Tech University, Yellow River Basin Ecotope Integration of Industry and Education R&D Fund under Grant No. XHYF2023-02; and 2020 University-Level Scientific Research Project under Grant No. Y2020B-02.