Abstract
Accurate device parameters play critical roles in calculation and analysis of power distribution network (PDN). However, device parameters are always affected by the operating status and influenced by manual entry; besides, the distribution area of PDN is very wide with many points, which brings more challenges to PDN parameter identification. Most of the proposed algorithms recently assume that the parameters of PDN contribute in a nonlinear probability space and optimize parameters by the power flow model with a loss function. Although these algorithms can achieve satisfying results in PDN analysis, the relationship between the power flow model and loss functions remains unclear. In this paper, the outputs of the power flow model have been analyzed firstly by experimental data, which includes the head and end voltages, as well as active and reactive power on the low-voltage side. It is revealed that the loss functions used by current algorithms are not suitable and reasonable for power flow model in PDN calculation, which constitutes one of the main findings of this work. Subsequently, this work proposes four novel loss functions combined with genetic algorithm (GA) and Markov Chain Monte Carlo (MCMC) to identify PDN parameters. Compared with the published algorithms, our experimental results show that the loss function defined in this paper can achieve better and more stable performance with about two times lower in MAE, RMSE, and RMPE evaluation functions to identify PDN parameters.
1. Introduction
Reliable and accurate parameters play import roles in the power distribution network (PDN) for many aspects, such as security analysis, system control, state estimation, line loss calculation, power flow calculation, protection setting, and fault analysis [1]. However, since real-time measuring equipment is always lacking in practice, it is difficult to obtain real-time information directly for PDN in security and stability situation. Some import parameters, such as line resistance, line reactance, transformer resistance, transformer reactance, transformer conductance, and transformer electrical susceptance, are usually assumed to be static in real situation, and it always results in poor calculations for PDN because of the differences of data in actual situation [2]. Some new approaches focusing on calculation efficiency and error reducing have been developed in many fields of PDN, such as supervisory control and data acquisition, power management unit (PMU), power management unit, and advanced metering infrastructure. These methods include full-scale approach [3], PSOSR [4], normalized Lagrange multiplier (NLM) test [5], finite-time algorithm (FTA) [6], residual method, sensitivity analysis method, Lagrange multiplier method [7], and Heffron–Phillips method [8]. Furthermore, some new approaches based on machine learning like particle swarm optimization (PSO) [9], evolutionary strategies [10], estimation using synchro-phasor data [11], PSCAD simulation [12], and deep learning [13] have been proposed. These methods are effective with simulation data, but they also have strict requirements for the precision of voltage and voltage phase angle. Besides, several measuring devices are needed to be placed in the grid, since these methods require data regarding the voltage, current, active power, and reactive power on both the high-voltage and low-voltage sides.
Identification equations based on power flow model for the equipment parameters in PDN have been established to solve the problems of lacking required data and measuring devices. It builds relationships between the parameters to be identified and the data which can be obtained easily. Generally, active power, reaction power, and voltage on the low-voltage side and high-voltage side in PDN can be obtained easily. Other parameters are lacking in practice, such as transformer electrical, transformer resistance, transformer conductance, transformer electrical susceptance, line resistance, and line reactance. The ranges of these parameters are always optimized by algorithms combined with the static parameters like active power, reaction power, and voltage on the low-voltage side. The calculating values of voltage on the high-voltage can be computed by power flow model with the parameters mentioned before, and the residuals between calculating and true values are always used to build loss function for optimization algorithms. Therefore, the problems of parameters identification can be easily solved by model-free methods, such as the least squares (LS) [14] and Markov Chain Monte Carlo (MCMC) [15]. LS is suitable for convex parameter space and is sensitive to the initial value, and it is difficult to employ LS for obtaining satisfied results in calculating of PDN. MCMC performs better than LS on regular data but it is hard to predict the parameters with outliers or high deviations. Based on this, LS and MCMC are hard to be applied in real situation.
With the rapid development of computer science and artificial intelligence technology, some new methods have been published in the field of PDN parameter identification. The method based on global optimization [16], including random search approach (RS) [17], tree-structured Parzen estimator approach (TPE) [18], and simulated annealing (SA) [19], have been published recently [20] and shown better performance of parameter identification than LS and MCMC. Genetic algorithms (GA) combined with transient measurements have been proposed to identify electrical parameters in a power network [21]. Besides, some methods based on machine learning, such as support vector regression [22], ensemble Kalman filtering [23], and interior point method, are combined with discrete particle swarm optimal [24]. Sun et al. [25] take advantage of the time invariant characteristics of PDN line parameters in a short time and proposed a method of convolutional neural network (CNN) for regression calculations for line parameter identification. These methods have proven to be effective for simulation data; however, only part of the parameters can be obtained, and the precise values of voltage and voltage phase angle are necessary.
The definition of loss function is critical for optimization algorithms since it determines the optimization objectives in PDN calculation. Nevertheless, most of the published algorithms define loss function by the residuals [20], the integral of the absolute error [21], or weighted least absolute value [24] between calculating and true values of a parameter directly (e.g., voltage on the high-voltage side). This approach of definition is straightforward, but it may neglect some statistical properties of parameters. If there exist the statistical properties, the calculating value of the parameters can be further transformed to reduce values of the loss function. In this paper, the value of voltage on the high-voltage side is regarded as the output from power flow model. Firstly, we investigate the statistical property between calculating and real values of this parameter, and an approximate linear relationship has been found by experimental data. Subsequently, a linear transformation is used after obtaining the calculating values of the parameters, and it is shown that this linear transformation can significantly reduce the residual between the calculating and real values of voltage on the high-voltage side. Finally, four linear statistical metrics, viz. Person’s correlation coefficient, Spearman’s correlation coefficient [26], Akaike information criterion [27], and Bayesian information criterion [28], have been used to build new loss functions for MCMC and GA to identify PDN’s parameters. Based on the abovementioned points, this paper mainly has the following contributions:(1)The statistical properties between the calculating and true values of PDN’s parameters are investigated, which are rarely studied and always neglected by other researchers.(2)The linear statistical property is found in our experiment, and this property can improve parameter identification in PDN’s calculation.(3)Four new loss functions are proposed by linear statistical metrics and combined with GA as well as MCMC to identify PDN’s parameters.
The main work of this paper is displayed in Figure 1, and the structure of the paper is organized as follows: Section 2 introduces the identification equations of power flow model in PDN and proposes the new definition of loss functions. The experimental data and calculation details are given in Section 3. The results and relevant discussions can be found in Section 4. Finally, Section 5 gives a brief conclusion.

2. Methodology
2.1. Power Flow Model Calculation
The basic theory of analysis in PDN can be found in [15, 20]. To simplify the computation, the three phases are assumed to be balanced as the premise for calculating the power flow in this paper. The schematic diagram of power flow calculation circuit model is shown in Figure 2.

In Figure 2, , , and represent the active power, reaction power, and voltage on the high-voltage side of the transformers at bus , respectively. And these parameters can be obtained directly by real-time measurements. Other parameters, such as transformer electrical , transformer resistance , transformer conductance , transformer electrical susceptance , line resistance and line reactance , are hard to be detected in PDN calculation and satisfy the following equations:where and in (3) are the longitudinal and transverse components of the transformer impedance voltage drop at bus in and can be expressed by (4) and (5), respectively:
The equation of bus can be expressed as (6)∼(8), whereas the final equation are calculated by (9):
The parameters in the line and transformer can be calculated based on (1)∼(9) with the measured power and voltage data.
2.2. The Optimization Algorithm for PDN
There are three critical difficulties pointed out by [22] in parameter identification task of PDN:(1)The parameter space of PDN may be nonconvex.(2)The calculation efficiency is low since there are too many parameters needed to be identified owing to the large scale of lines and nodes.(3)The numerical levels of parameters are exponentially different, which can affect the identification results due to the measurement errors.
These three critical points can lead some commonly used algorithms to suffer in parameter identification of PDN, such as least squares (LS). Therefore, [15] proposed Markov Chain Monte Carlo (MCMC) to solve these problems, and the methods based on sequential model-based global optimization (SMBO) [20] have also been proposed and shown their excellent performance to overcome these problems. In this study, MCMC is considered as the optimization algorithm for PDN. Besides, genetic algorithm (GA) is also introduced in this paper as a classical intelligent optimization algorithm.
2.2.1. Markov Chain Monte Carlo (MCMC)
MCMC has been firstly introduced into PDN calculation by [15], where the detailed theory of MCMC can also be found, and the main processes of MCMC in parameter identification are given as follows:(1)Determining the initial distribution of the parameters, and the distribution can be set as a normal distribution or a continuous distribution since the selection of the initial values does not affect the smooth convergence.(2)Random samples generated through Monte Carlo sampling.(3)Obtaining the value of loss function (fitness function) by power flow model.(4)Obtaining the probability of parameters and calculating Markov chain.(5)Optimal approximate solution of parameters.(6)Updating parameter distribution by means of Bayesian posterior and updating the joint distribution of parameters.(7)Repeating steps (2)∼(6) until the maximum value of iterations is reached or the convergence conditions are met.
2.2.2. Genetic Algorithm (GA)
Stochastic optimization techniques such as GA are gaining increasing popularity in various fields. GA is especially beneficial to the complex search space with many local optimal solutions compared with conventional methods to find the global optimal solution. The main processes of GA can be listed as follows:(1)Coding of the parameters needed to be identified and randomly generating plenty of chromosomes with different initial codes.(2)Evaluating each chromosome by the loss function.(3)Selecting a number of chromosomes randomly by the probability associated values which are calculated by loss function in step (2). These chromosomes are randomly paired and create 2 new chromosomes by randomly assigning to each of the genes of one of the parents. This step is also called reproduction of GA.(4)Randomly selecting some chromosomes and flips their codes; this step is also called mutation of GA.
Repeat steps (2)∼(4) until a criterion is reached.
2.3. The New Definitions of Loss Function
2.3.1. The Statistical Properties in PDN Parameters
Instead of identifying parameters to PDN by optimization algorithms, the static parameters of the standard 10 kV feeder network are used to calculate the standard voltage value per unit for the voltage data, which is signed as . The difference between and can been seen as a baseline result. The static parameters of the standard 10 kV network are shown in Table 1.
With the help of the parameters in Table 1, the calculated values of and and are displayed in Figure 3.

The values of and have a large discrepancy in Figure 3, and we can also find that and have a similar tendency with the series of time period. The scatter plots of and are displayed in Figure 4.

The distributions of and are obviously linear. If the values of are regarded as independent variable, and is seen as dependent variable, the linear regression can be applied to these two variables, and the statistical results are summarized as follows.
2.3.2. The Definitions of Loss Function Based on Linear Property
According to the results in Table 2, there is a significant linear relationship between and , and it verifies the correctness of the model for power flow calculation in Section 2.1. The reason why the values of and have a large discrepancy might be the assumption of three-phase balance, and the error of real-time measurement can also affect the calculation of . Therefore, directly evaluating the performance of optimization algorithm and defining loss function by residuals-based metrics are unsuitable and unreasonable. In this section, four metrics based on linear regression are proposed as loss functions to optimization algorithms, which are listed as follows: and are Person’s correlation coefficient and Spearman’s correlation coefficient, where denotes the number of samples. The correlation coefficient ranges from −1 to 1. An absolute value of 1 implies that a linear equation describes the relationship between x and y perfectly, with all data points lying on a line. The correlation sign is determined by the regression slope: a value of +1 implies that all data points lie on a line for which y increases as x increases and vice versa for −1. indicates Akaike information criterion, and are natural logarithm and likelihood function, and is the number of variables. is denoted as Bayesian information criterion, which considers the influence of sample number.
In this study, the values of voltage calculated by MCMC and GA are denoted as , which can be seen as independent variable like . Instead of regarding the residuals between and as loss function directly, in this study, the new loss functions are based on the degree of linear property. Therefore, , , , and are used to build loss functions for MCMC and GA. It should be noted that the larger absolute values of and mean the linear relationship are closer, and the smaller values of and indicate the better fitted result of linear regression.
3. Experiment
3.1. The Descriptions of Experimental Data
In this paper, 1499 samples in the raw dataset were collected by SCADA with the sampling period of 15 minutes. The three-phase first section voltages on the high-voltage side (denoted as , , and ) are shown in Figure 5, and the corresponding low-voltage sides of them (denoted as , and ) are displayed in Figure 6.


It can be found in Figures 5 and 6 that the high-voltage sides in the dataset are closed to the three-phase balance, and this dataset satisfied the requirements of the equations in Section 2.1. In addition, the active power (denoted as , , and ) and reactive power (denoted as , , and ) of three-phase on low-voltage side are shown in Figures 7 and 8, respectively.


The trend of changes of active power and reactive power on low-voltage side is consistent in Figures 7 and 8, and it indicates that the samples in this dataset are stable and can be used to perform parameter identification.
3.2. Evaluation and Calculation
In this paper, three metrics, mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE), are used to evaluate the performance of parameter identification and listed as follows:
The true values and calculating values of voltage per unit in bus (denoted as and ) are used to calculate the metrics in (14)∼(16). Instead of directly calculating these metrics, the linear regression should be applied in this paper, and the values of and are regarded as dependent variable and independent variable, respectively. The output values of linear regression are signed as , and the final evaluations of parameter identification are gained between and aswhere and are denoted as slope and bias of linear regression, and they can also be optimized like other parameters by MCMC and GA. In the following discussion, the parameters of linear regression optimized by MCMC and GA are signed as MCMC-LR and GA-LR, respectively. The upper bounds and lower bounds of MCMC and GA should be determined before parameter identification, and they are listed in Table 3.
To avoid the impact of randomness on GA and MCMC, each method is repeated 25 times to guarantee the correctness and stability of results.
4. Result and Discussion
MCMC and GA with residual-based loss function are applied for parameter identification firstly, the residual-based loss function refers to [22]where and are the true value and calculating value of voltage for the sample. The comparisons between baseline algorithm and the algorithms proposed in this work on three evaluation functions are listed in Table 4.
The results in Table 4 show that the optimization algorithm can significantly improve the performances of parameter identification of PDN, and the three metrics’ values have a larger decrease than baseline parameters. The comparisons between and by MCMC and GA algorithms are exhibited in Figures 9 and 10, respectively.

(a)

(b)

(a)

(b)
The results in Figures 9 and 10 show that and have a relatively close linear relationship, and the linear regression results are obtained in Table 5 with the values of and . Compared with the baseline results gained by the static parameter in Section 2.3, the results in Table 5 are worse since the optimal objective is decreasing residuals directly.
To evaluate the performances by the method in Section 3.2, linear regression will be applied for and firstly and the results can be found in Table 6 by MCMC and GA with linear regression of and .
The results in Table 6 indicate the parameter identification performance of static parameter is better than MCMC and GA from the linear regression perspective, and these results show that the loss function based on linear regression is more suitable and reasonable for PDN calculation. The parameter identification results based on the loss functions in Section 2.3 are listed in Table 7.
There are two main points that can be found in Table 7; the first point is that the parameter identification performances based on the proposed loss functions are all better than the residual-based ones, which show the effectiveness and rationality of the loss function’s definition in this paper. Taking the results of MCMC-R1 as an example, the line plot and scatter plot between and are shown in Figure 11. The line profiles of and are almost overlapping, and their degree of linear relationship has a large improvement. The second point is that the performances of , , AIC, and BIC do not have significant differences, and this result reflects that the metrics selection of linear regression is not top priority; what matters is that the loss function needs to be defined by these metrics.

(a)

(b)
GA-LR and MCMC-LR contain the parameters of linear regression and optimize them with other PDN’s parameters simultaneously. However, their performances are worse than other methods shown in Table 7. The main reason is that PDN’s parameters have impact on linear regression in optimization process; therefore, applying linear regression after PDN’s calculation is a better choice in practice. The bar charts with error bars of each method are shown in Figure 12. The upper and lower bounds of error bar are the mean values of metrics (such as MAE), add or subtract standard variance obtained by 25 repeated experimental times, and it can be found that the methods proposed in this paper are robust with lower values of standard variance even if MCMC and GA present random-like properties. The results of MCMC-LR are neglected in Figure 12 since their values are too large to exhibit appropriately in one figure with others.

(a)

(b)

(c)

(d)

(e)

(f)
5. Conclusion
Parameter identification plays a key role in PDN calculation and analysis; therefore, some methods have been proposed to improve the accuracy of parameters in PDN, such as LS, MCMC, and sequential model-based global optimization. However, these proposed methods pay more attention to the residuals between the true values and calculating values of voltage. In this paper, the linear relationship between these two values has been revealed firstly, which indicates that calculating metrics based on residual is not suitable in PDN, and the four new loss functions based on linear regression have been proposed to combine with optimization algorithms, which are Pearson’s correlation coefficient, Spearman’s correlation coefficient, AIC, and BIC, respectively. The averaged values in MAE and RMSE evaluation functions in parameter identification by MCMC and GA with the residual-based loss are about 65, which is much lower than 90 by traditional algorithms, and the value in MAPE evaluation function is about 1.05, which is also much lower than 1.47 by baseline algorithms. The results show that the accuracy and stability of parameter identification have a significant improvement when the loss function proposed in this paper is combined with GA and MCMC, and these loss functions can also be used in other optimization algorithms to identify the PDN’s parameters. In addition, it should be noted that the heuristic algorithms utilized in this work strongly depended on the initialization, which may lead to the local optimization. And the optimization speed for parameter identification should also be improved in the future works.
Data Availability
The data used to support the findings of this study are currently under embargo while the research findings are commercialized. The data used to support the findings of the study can be obtained from the corresponding author, 6/12 months after publication of this article, upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Nanjing Institute of Technology Scientific Research Start-up Fund for High-level Introduced Talents, Grant no. YKJ202046, and the State Grid Jiangsu Electric Power Co., Ltd., Grant no. J2020097.