Abstract
Load forecasting is an integral part of the energy study unit to schedule the generating unit by the load demand. Many studies were conducted on load forecasting based on real power demand; however, very few papers were published on reactive power demand. In this research work, an attempt has been made to predict the requirement of reactive power as a function of demand for real power. Household loads are considered for evaluating the demand for reactive power as a critical load. The attempt has been made based on the data collected from the laboratory experimental setup for one year. The load forecasting requires time series analysis of the data set along with error minimization between predicted values and actual value; therefore, the global flower pollination algorithm along with Holt-Winters’ exponential model has been used to predict the reactive power. Autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving average (SARIMA) models have been used as benchmarking models to evaluate the effectiveness of the model under various conditions. Python-developed model has been used to predict the demand for reactive power, and a MATLAB model has been developed to optimize the cost function. A detailed comparative analysis of the proposed model along with some well-established optimized models such as GA, PSO, and FPA has been presented related to evening peak demand for a microgrid architecture in Conclusion. The analysis includes median values of different quantities such as nMBE, nMAE, nRMSE, and RMSE. Normalized MBE, indicating underestimation and overestimation, is negative for ARIMA but 0.42 for HW-GFPA during validation and 0.43 for the testing data set. Normalized RMSE, measuring the variance between actual and forecasted values, is lowest at 0.803 for proposed HW-GFPA during validation and 0.799 for testing.
1. Introduction
The term “load” in load forecasting applications usually refers to active power (kW) or energy (kWh). There are few papers or articles on reactive power prediction. Many important processes are based on the correct understanding of reactive power such as voltage/VAR optimization, power quality improvement, frequency control, and steady-state power flow analysis [1]. Therefore, accurate reactive power forecasting is necessary for power engineers to understand future reactive power profiles and improve the quality of grid operations at the transmission and distribution level [2, 3].
In recent years, the commercialization of smart grids and the liberalization of energy markets have created additional demands for accurate reactive power prediction [4, 5]. In literature, the day-ahead and hour-ahead planning of distribution energy resources requires reactive power forecasts to ensure compliance with grid limits [6]. These forecasts are necessary at the aggregate level and point load level to achieve good planning. The impact of reactive power on optimal power flow solutions in smart grids and microgrids was discussed [7]. Reactive power forecasting is also relevant to the power market. In many countries, power markets only manage active power trades at regular time intervals. Nonetheless, reactive power projections test the technical feasibility of energy pathways from power plants to consumers by transmission system operators reconfiguring market systems when technical boundary conditions are violated [8, 9]. Moreover, reactive power forecasting is expected to become a fundamental input for the reactive power market shortly [10, 11].
As already mentioned above, the current practice of reactive power prediction consists of multiplying the active power prediction by a factor related to the average power factor or using a simple method that relies on the experience of grid operators [12, 13]. Several scientific papers deal with the prediction of reactive power. Artificial neural networks have been proposed to predict reactive power in houses and substations. Neural networks have also been presented in a probabilistic framework to predict reactive power in bulk utility buses [14, 15]. Another approach based on linear and piecewise linear relationships between active and reactive power was used to build a regression model of reactive power [16]. The Takagi-Sugano fuzzy method is applied to the ultrashort-term prediction of active and reactive power [17].
Reactive power forecasting based on fuzzy logic requires framework verification for each stage of approval. Again, rationalizing each step is also not feasible for analyzing the boundary condition [18]. This also affects the accuracy setting and enrolling capacity of the system. It also lacks hypothesis testing, which is required to validate the model under partial variation conditions. Furthermore, ANFIS involves a very complex structure and gradient learning strategies for its realization [19]. The ANFIS model becomes more complicated and, at the same time, also requires more computational time. The design of the membership function dynamically also requires a very complex analysis [20]. It is required to handle seven parameters at the input to predict the demand for reactive power in a microgrid. The seven parameters include voltage, frequency, time stamp, number of nonlinear load connected, temperature, real power drawn, and power factor. To address such a large number of data sets, it is required to go for regression analysis, which will combine all the input to predict the output at any instant of time for 5 minutes or 300 sec. Thus, short-term forecasting can be best modeled by the autoregressive integrated moving average (ARIMA) model. Here, in this research work, a seasonal variation to ARIMA has been added to make it simpler and practically feasible. These ARIMA and SARIMA models will act as base models or benchmarking models. The research paper has the following contributions and outcomes: (i)The dynamic optimization model for reactive power prediction based on real power demand and six parameters has been modeled using a global flower pollination algorithm(ii)The developed framework (algorithm) is accurately forecasting the short-term demand of reactive power
The rest of the research article has been organized as follows: Section 2 represents the problem formulation based on the proposed scheme as mentioned above. Two benchmarking models such as ARIMA and SARIMA have been discussed under the benchmarking model of Section 4. A detailed result analysis has been described in Section 5 followed by the conclusion discussed in Section 6.
2. Problem Formulation
In order to analyze the reactive power forecasting based on the real power and consequent price prediction, a time series model can be designed. The flow of reactive power () is a function of voltage and real power drawn over a period of time. Therefore, where represents the reactive power drawn over time and represents the functional parameter of voltage. Taking convolution of eq. (1) must satisfy
The convolution method applied over eq. (2) results in the establishment of the white noise signal . Now, in order to analyze standard independent time series data, the system can be modeled in terms of the linear-independent model. Let us compare and model a linear time series signal , representing real-time power transferred between two identified buses as a function of reactive power flow data. Therefore,
Equation (3) shows that the flow of both real and reactive power including losses can be made the same with the rest of the white noise data. Both the time and its time inverse are on the same plane of operation; therefore, the series can be modeled in terms of the ARIMA model of the two different (, ) processes. Figure 1 shows the contour for reactive power energy for the scaled area .

So, defining for a set of periods of data by equation , the ARIMA function can be written as
Equation (4) represents the series analysis of reactive power modeled on influenced as a function of real power and constant over a period of time. Again, from a regression analysis point of view, this model is termed an ARIMA(p) process. Similarly, the ARIMA(q) process becomes
Again, from eq. (4), for , eq. (4) can be written as
Applying energy to eq. (6), it becomes
Now, solving eq. (7) for , it becomes or
Here, eq. (11) represents the unique existence solution of eq. (10). Now, deriving the covariance function, it becomes
Again, further deriving eq. (12), the energy content of the system becomes
Again, on simplification, eq. (13) becomes
Therefore, the inline optimization becomes
The energy density equation presented in eq. (16) shows the Gaussian distribution based on probability theory. The empirical confidence can be taken into consideration for evaluating the interpolation. Keeping in view the spikes present in the waveform, in this work, Gaussian flower pollination algorithm (GFPA) is used instead of the flower pollination algorithm (FPA). The merits of GFPA over FPA are listed below. (i)Easy in segregating the collateral spike(ii)Step size can be varied in accordance with spike length(iii)Analysis of periodic exponential term with respect to seasonal variation in reactive power demand
Now, the modified optimization equation becomes
Being a seasonal regression on time series data, the exponential decaying part presented in eq. (1) converges itself to a solution. A detailed process of flowchart is presented in Figure 2.

3. Data Collection and Analysis
Reactive power analysis over a time period requires a rigorous evaluation of load performance for both linear type and nonlinear type of loads. Therefore, two simulation test models have been created inside the lab one with 20 kVAR load and 30 kVAR load. Both the loads are connected to the microgrid through two different sources such as solar PV and battery. The 20 kVAR load consists of high-speed blowers, heaters, and 7 induction motors (each 2 hp). In order to capture the line power data such as time stamp, reactive power, real power, current, and energy consumption, a phasor measurement unit has been installed at each nodal point. Here, the nodal point represents the load termination point. All the loads were connected to the main grid through the line scheduler (TPS2301 and SLVS277) with a preloaded time setting. This enables the operator to create an artificial transition between loads, thereby enabling the PMU to capture the required power quality. Thus, the data collected from PMU was saved into a CSV file for further analysis, testing, and evaluation.
The data were collected every 15 seconds for three months. After the collection of the data depending upon the type of load, the data were segregated into 5 different clusters based on their nominal, ordinal, interval, and ratio. All the data have undergone a redundancy check for incomplete data sets like inserting the unavailable data wherever necessary or deleting a particular set of data. Table 1 shows the basic statistical analysis of PMU data.
The data has been separated from each other based on the null hypothesis. As observed in Table 1, the mean value is lowest for 45 days of data, i.e., -0.223. This signifies that the data points as collected from PMU are not far away from each other. This also signifies that, at no condition, the null hypothesis is rejected. The average standard deviation by consolidating all the data is limited to 1.01%, which is much smaller than the allowed statistical deviation of 5%. Again, as observed, the kurtosis is lightly tailed, which means that the data are not highly deviating from its normal distribution.
Table 2 represents the null hypothesis analysis of the cluster. As seen, the -square error is gradually decreasing from cluster C-15 (days) to C-90. This means that as we approach a large data set, the data set becomes more viable for a time series analysis because the goodness of fit has also increased. The average standard error by considering all the clusters becomes 0.698 (average). This signifies that 69.8% of data can take part in the regression analysis, as they represent how distinct the data are inside the data set. The variation in the and lies in the range of 0.99 and 1.
This section of the data analysis part represents that all the clusters of data are arranged in proper order as desired by the time series algorithm. All the statistical analysis as shown in Tables 1 and 2 has been utilized by the benchmarking model for further analysis in the next section.
4. Benchmarking Model
4.1. ARIMA
The autoregressive integrated moving average time series model is an amalgamation of differenced autoregressive and moving average models, i.e., a combination of past output and some random noise. The AR in ARIMA represents the autoregression aspect of time series analysis on its data, MA signifies the moving average of errors, and I stands for integration, indicating that the data has been differenced with its previous values. Therefore, modifying eq. (15) in terms of ARIMA,
Equation (18) represents the ARIMA model reactive power forecasting. Here, represents the autoregressive, and represents the moving average error. represents the random noise. Now, based on eq. (18), two ADF models have been modeled with two different values. The critical parameters for the order of (1, 1, 2) have been evaluated. Table 3 shows the ADF statistics for different values.
Table 4 shows the ARIMA model error analysis for the order (1, 1, 2). Here, in Table 4, the minimum standard deviation has been observed for Ar.L1.D value of 0.213 and that of -0.455 for 0.975 accuracy level. Similarly, Table 5 represents the ARIMA modulus status for the order (1, 1, 2).
Tables 6 and 7 represent the ARIMA model error analysis for the order (2, 1, 2) and modulus status. As shown in Table 6, the minimum standard error of 0.116 was recorded for Ar.L2.D value. Similarly, the corresponding probability value is 0.772, and that of the upper boundary is 0.262. The corresponding moving average value is 0.307 (std. error). This signifies that the system becomes valid under the upper limit of -0.211 to 0.262.
Figure 3 shows the autocorrelation analysis with 1st-order difference and 2nd-order difference of the original reactive power demand series. The higher-order difference analysis has been conducted to find any possible good solution. It is observed that the autocorrelation error for the second difference is very small as compared to the first-order difference. Again, for 91 number of observation, the AIC is 1306.99 and BIC is of 1322.058. Similarly, the HQIC becomes 1313.07.

Figure 4 represents the forecasted reactive power as a function of real power demand for 20 days. Here, the analysis has been carried out for 95% of the confidence interval. Similarly, Figure 5 shows the rolling mean and standard deviation analysis for the original forecasted data. The rolling standard deviation shows the deviation from the average of the reactive curve. Here, it can be found that the maximum deviation is .


4.2. SARIMA
Seasonal ARIMA is one of the most widely used forecasted models. The direct modeling of the SARIMA model can be modeled by using SARIMA . Here, represents the nonseasonal component, and represents the seasonal parameter.
Figure 6 represents the autocorrelation function plot for the SARIMA model. As observed, the initial and seasonal . It clearly depicts that autoregression is having order 1 and seasonal AR is of order 1.

Similarly, Figure 7 represents the partial autocorrelation for SARIMA. As observed from the figure, the initial lag spike is at 1, and that of the seasonal spike lies at 10. It means that it has a moving average of 1, and that of the seasonal dependent average order is also 1.

Based on the order and class as obtained from Figure 7, Table 8 represents the SARIMA fit model with respect to reactive power time series data.
The autocorrelation has a direct impact on the partial correlation with an order of 50.37% as reported by Ljung-Box. Similarly, a probability of 87% dependency has been observed from the result analysis. The heteroskedasticity factor has been found to be 0.58, which determines the monotonicity involved in the prediction process.
The standard residual plot along with the histogram is plotted in Figure 8. As observed here, the mean residual plot is zero and all the residuals are uncorrelated. This states that the model is well fitted with the data and values.

(a)

(b)

(c)

(d)
4.3. Extreme Gradient Boosting (XGBoost)
The XGBoost is the application-oriented model of random forest (RF). XGBoost adjusts the weight of weak learners, thereby improving the forecasting rate of each tree. XGBoost improves the generation of complex data sets by avoiding overfitting of the curve against the data set. The nonlinearity relationship among the complex data set can be handled by providing additive training on each data point, where each previous state of iteration solution () is considered as an actual forecast. This enables high-dimensional feature space analysis. Mathematically,
In eq. (19), represents the iteration level, and represents the time series data of each level. represents state of data, and that of represents the previous state of data. Equation (19) offers a wide range of hyperparameter selection, which enables the operator to optimize the model performance.
XGBoost parameter analysis for regression study has been presented in Table 9. Figure 9(a) represents the predicted vs. actual graph of reactive power demand with a learning rate (L-Rate) of 0.032. As observed, the sample has an associate rule of 0.11. The associate rule here represents the probability of data points inside the data set or how closely they are related to the data of different data sets. The result shows that (Figures 9(b) and 9(c)) the maximum probability of 0.14 can be achieved with the data set of L-Rate and that of 0.34 for tree analysis. Figures 9(d)–9(f) represent feature importance vs. feature index for three different learning rates. As observed in Figure 9(d), the feature importance is positive for sample-3 only. This means that the accuracy of the model in determining the feature prediction is 0.02 which is very low. In Figure 9(e), the feature importance becomes positive for two samples with the highest value of 0.2 and with a learning rate of 0.067. Therefore, it can be assumed that there exists a scope for further improvement. In the 3rd trial in Figure 9(f), all 3 samples have shown positive feature importance with a maximum accuracy level of 0.43.

(a)

(b)

(c)

(d)

(e)

(f)
5. Result Analysis
The experimental setup for the proposed work has been started with data collection for reactive power from the laboratory-operated critical load. Three months of continuous data have been collected, i.e., May 2022 to July 2022, through the phasor measurement unit (PMU) from various sensitive points. The data has been tested for redundancy check using Python programming, and wherever the data was missing is filled with mean data from that series.
In order to create a proper mathematical optimization model for its validation with respect to the problem-formulated modeling, a curve fitting analysis has been carried out over the data set. Four different methods of curve fitting have been applied to the data set such as sine order-8, interpolant, polynomial, and smoothening order-4. The detailed statistical analysis of the method is shown in Table 10.
Figure 10(a) represents the sum of the sine curve fitting for different samples. An 8-order fitting model has been applied. As observed, numbers of the sample have been used to predict the sample. Root mean square error (RMSE) has been observed for the smoothing spine curve (Figure 10(b)). As observed, only 112 numbers of the samples have been outside the boundary of the fitted curve. Similarly, polynomial and interpolate analysis has been presented in Figures 10(c) and 10(d). Overfitting and underfitting of the curve have been found in Figures 10(a) and 10(d). Therefore, in this research, article curve fitting model of the smoothing spine has been taken into consideration for further analysis.
|

(a)

(b)

(c)

(d)
In order to optimize the local sample on the smoothing spine, curve global flower pollination-based optimization has been applied in accordance with the optimization equation as presented under eq. (27) and eq. (28). Algorithm 1, presents the pseudocode used in the pollination algorithm for the global optimization of the curve, here referred to as the reactive power fitted curve.
Figures 11 and 12 represent the cost function optimization for two different cost functions of and . It is observed that the error in tracking the trajectory path is limited to 0.026 for and 0.022 for . At about 10.00 sample time (Figure 12), the actual direction of optimization is the same as that of trajectory. Based on this, Tables 11 and 12 represent a detailed comparative analysis of different optimization algorithms such as PSO, GA, and FPA. Here, FPA shows better performance for 750 samples of observation with min. deviation of 0.47.


Figure 13(a) represents the fitness function analysis for FPA with an initial pollen level of 2.33, enabling reactive power optimization. It determines the closeness of the optimized solution to the set point. Here, 5-fold cross-validation has been applied to find the mean fitness value at 4603.58. Figure 13(b) shows the current best individual. Here, a sample pollination size of 10% has been used for the evaluation of the distance between two conjugate pollens. In order to find the best local optimized solution, the average distance between two pollen has been maintained at as shown in Figure 13(c). Fitness scaling for each individual is presented in Figure 13(d). Here, expectation vs. the raw score is presented. As noticed for a score level of 4603, a maximum expectation of 4 has been found. Similarly, each individual’s fitness is presented in Figure 13(e), where a fitness value of 4000 is found for every individual. The selection function for each individual is presented in Figure 13(f). Here, individuals 6 and 9 have been taken into consideration for further analysis because these individuals have the highest number of children. Similarly, Figure 14 represents the flower pollination optimization with an initial pollen level of 3.07.

(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(e)

(f)
In order to analyze the data using Holt-Winters’ exponential smoothing, the following steps have been followed: (i)Prepare the data set to find about trend and seasonal patterns(ii)Linearity inside the elements of the data set is established(iii)Identification of seasonal multiplication or addition for prediction over the data set
In this paper, the data set has been tested for both seasonal multiplication and addition models evaluating the effectiveness of the system. Table 13 shows the seasonal multiplication method under different weight conditions, and Table 14 shows the additive method under different weight conditions. Algorithm 2 shows the pseudocode for reactive power prediction using Holt-Winters’ exponential model.
|
Figure 15 represents the reactive power flow/demand over 3 different months. Three different patches of reactive power demand have been evaluated. Figure 16 shows the distributed seasonal trend of the reactive power samples. The trend pattern of the data reveals that the prediction is a linear combination of exponential data and sine waveform. Here, a residual of 1 unit has been observed. The seasonal pattern has an accuracy level of 0.98 units, which is 7.09% efficient as compared to ARIMA and 11.38% efficient as compared to the SARIMA model. As desired, the additive and multiplicative trends of the proposed model merge at the end of 3 months of data. However, there exists an RMSE level of 3.28 between the two sets of predictions.


Figure 17 represents Holt-Winters’ additive and multiplicative trend analyses for reactive power prediction. The prediction has been done for 5 min intervals of time for the next 5 days. The error between additive and multiplicative forecasting is merged together and found to be zero. This indicates that the prediction is accurate for all seasonal variations.

(a)

(b)
Table 15 represents the prediction for an 8-day evening peak using HW model. As observed during 8-9 pm, maximum evening peak has been observed of a maximum magnitude of 917.415 MW on day 3, with an average demand of 843.39 MW on day 1. All the peak values were also validated against their actual values based on the history data set. Table 16 represents a comparative analysis of error matrices between the benchmarking model and the proposed model. The median analysis of different quantities such as nMBE, nMAE, nRMSE, and RMSE has been conducted. Normalized MBE, which evaluates the underestimation and overestimation of data inside a data set, is found to be negative for ARIMA, whereas the least positive value of 0.42 has been found for HW-GFPA under validation. Similarly, for the testing data set, it is 0.43. The normalized RMSE, which determines the variance present in between actual measured values and forecasted value, is 0.803 (least) for proposed HW-GFPA for validation and of 0.799 for testing.
6. Conclusion
In this research paper, an attempt has been made to find the reactive power demand based on the usage of real power. Out of 1st-year load data, only three months of critical load reactive power has been taken into consideration from the lab experimental setup.
The problem formulation, as discussed in Section 2, reveals that reactive power forecasting can be modeled based on the load demand. After detailed modeling, it is understood that the global flower pollination algorithm can be best suited for the point-by-point optimization of the system. Again, in this research paper, Holt-Winters’ model has been used for forecasting reactive power.
In order to validate the robustness of the proposed model, ARIMA and SARIMA models have been taken into consideration as the benchmarking models. The GFPA has been validated against PSO and GA regarding its effectiveness in predicting the time series data for two different cost functions. Again, under the result section, a detailed sample forecasting for three different situations such as normal load, critical peak demand during the day time, and critical peak demand during the night time has been presented as forecasted reactive power for 8 days in the month of August 2022, which is also validated in terms of MAPE and average error of actual data during that time interval for the same month.
The result analysis shows that the proposed Holt-Winters’ model enabled with GFPA can predict effectively reactive power based on the active power against ARIMA and SARIMA models. In this research article, unit price commitment based on reactive power has not been taken into consideration; however, future work can be made based on this research matter.
Data Availability
The data used to support the findings of this study are included in the article.
Conflicts of Interest
There is no conflict of interest on behalf of all authors.