Abstract
Aiming at the fault line selection problem in the single-phase grounding system of the distribution network, a new fault line selection method based on VMD and permutation entropy feature extraction combined with K-means clustering algorithm is proposed. This method is a hybrid algorithm that can effectively identify fault line selection. Firstly, a simulation model is built and its zero sequence current is collected. The variational modal decomposition method is used to decompose the collected zero-sequence current into multiple intrinsic modal functions, which can not only effectively reduce the influence of harmonic components and noise in the characteristic signal but also facilitate the calculation. The extracted intrinsic mode function is calculated by permutation entropy (PE), and the calculated entropy value is constructed into a matrix to highlight the fault characteristics of the line; then, the matrix is subjected to K-means cluster analysis through the preprocessing algorithm and the faulty line is correctly distinguished. Then, regression verification is performed. Finally, it is verified by the recorded wave data of the real test site and then analyzed and compared with other algorithms. The proposed method shows that when a single-phase ground fault occurs, the ground fault line selection can be effectively identified under different transition resistances, grounding resistances, and fault distances. Therefore, this method can accurately identify the fault line selection, and the accuracy rate is 100%, which has a certain use value.
1. Introduction
The power system constructed in China is mainly based on ungrounded and grounded arc suppression coils, which are also called small current grounding [1, 2]. In the introduction of [3], with the increase of power grid capacity and cable line scale, the single-phase ground faults in the power grid composed of cable lines are mostly permanent faults. In order to prevent the single-phase grounding fault from developing into phase-to-phase fault, the protection device is required to cut off the fault line accurately. After the single-phase grounding fault occurs in the low-current grounding system, the fault current generated is small, which leads to poor line selection ability, and it is difficult to remove the faulty line accurately and quickly [4]. At the same time, the overvoltage level of the neutral point grounding system via the arc suppression coil is high during the operation of the power grid, which makes it difficult to select the line and adjust the capacity of the arc suppression coil. These not only endanger life security but also affect economic life and so on. However, the neutral point via small resistance grounding system and the direct grounding system generate larger fault currents when a single-phase grounding fault occurs, so they are also called high-current grounding systems [5, 6]. Because the generated grounding current is large, it has a good grounding line selection function and can accurately remove the faulty line. Gradually, in the urban power grid, the neutral point of the 10 kV power system is gradually grounded with the neutral point through a small resistance, which can suppress the arc overvoltage and avoid equipment damage [7]. According to statistics, most of the faults occurring during the operation of the distribution network are short-circuit faults, and 70% of the low-resistance grounding systems are single-phase grounding faults [8, 9]. Therefore, it is necessary to research and design the fault line selection when the single-phase grounding fault occurs in the small-resistance grounding system.
Signal processing is an important analysis part of fault line selection. Using effective signal processing methods can improve the success rate of fault line selection. Different signal analysis tools have been used for extraction of main features, such as empirical mode decomposition (EMD) [10], Fourier transform (FT) [11], wavelet transform (WT) [12, 13], and Hilbert-Huang transform (HHT) [14], used to extract time-domain and frequency-domain information and so forth. However, these algorithms have certain shortcomings when dealing with nonlinear signals and nonstationary signals. References [15, 16] explained that the empirical mode decomposition is prone to mode mixing and end-point effects, resulting in the extraction of fault features that are not obvious enough; and EEMD, as an improved version of EMD, can only suppress the mode to a certain extent of state aliasing, and the decomposed signal may also have a certain residual error. Reference [11] mentioned that Fourier transform is only suitable for global analysis of signals, not for local analysis of signals, and only for stationary signals. Reference [13] pointed out that the wavelet transform has the same function as the Fourier transform, but the wavelet transform mainly performs time-frequency analysis on the signal on the basis of linearity and nonlinearity. However, in the analysis of wavelet decomposition in [17], it is necessary to select a suitable wavelet base; otherwise, the decomposition effect will be affected; if there is noise in the signal, the wavelet decomposition effect will also be affected. In the introduction of [18], the Hilbert-Huang transform (HHT) decomposition is mainly divided into two parts. Firstly, EMD is used to decompose the signal into several intrinsic mode function IMFs, and the Hilbert spectrum is obtained by Hilbert transform of each IMF. But it is suitable for decomposing abrupt signals at high frequencies, but some frequency components that do not exist may appear in the low-frequency region. Variational Mode Decomposition (VMD) [19] is a signal decomposition algorithm which can effectively deal with nonlinearity and nonstationarity. Since it was proposed in 2004, it has been mainly developed in the fields of mechanical bearing fault diagnosis and transformer online monitoring [20]. In [21, 22], the VMD algorithm is used to extract the fault characteristic signal from the vibration signal. Certain noise and harmonic components often appear in vibration signals and VMD algorithm using has better decomposition ability and can effectively extract fault characteristic signals usually. Therefore, compared with other algorithms such as EMD and WT, VMD has better signal decomposition and feature extraction capabilities.
Permutation entropy (PE) [23] is a method that can detect random time series complexity and describe the degree of dynamic mutation, which can quantitatively evaluate random noise in signal sequences. Compared with algorithms with higher computational precision such as Lyapunov index [24] and fractal dimension [25], permutation entropy has the advantages of simple calculation and strong robustness against noise, and the calculation value is not easy to change [26]. Reference [27] introduced that if the permutation entropy is directly used for fault feature extraction, the fault features will not be obvious due to the influence of noise. Therefore, PE is often used as a fault detection technique after the initial feature extraction technique [28]. Permutation entropy is selected to extract fault features of VMD-decomposed IMFs in this paper. Because in the decomposed IMFs each IMF contains the fault features at different time scales in the fault signal, by calculating the permutation entropy of each IMF component and forming a feature matrix, the fault features at multiple scales can be effectively highlighted.
After effectively extracting fault features under different fault conditions, a high-performance multiclassifier is used to differentiate and classify the fault modes. Cluster analysis is a mathematical statistical method to study “clustering of objects” [29]. Clustering analysis can classify some observed objects according to certain characteristics or divide a set of samples of unknown categories into several categories through some kind of algorithm [30]. Traditional clustering algorithms can be divided into two categories: hierarchical and partitioned [31]. This paper focuses on partition clustering. The K-means clustering algorithm is one of the most popular partition clustering methods and is used in various fields [32]. Reference [21] pointed out that K-means clustering analysis is a classic algorithm for solving clustering problems, which is simple and easy to implement, and the algorithm has low time complexity. The algorithm follows a specific division idea and transforms the clustering of data into a nonlinear optimization problem; it divides all data into the class where each center point is located and decomposes it by taking the minimum distance between the calculation and the initial center point as the principle. In [33], in order to solve the faults that occur in the protection misoperation of the collector line in the wind farm, the advantages of the K-means clustering analysis algorithm for classifying the samples to be tested with fault characteristics are analyzed. Then the K-means clustering algorithm is used to compare the spatial distance of each cluster center to realize protection discrimination. In [34], aiming at the problem of numerical dispersion of line loss in the station area, K-means clustering algorithm is used to solve the problem of numerical dispersion in the station area, and the scattered samples are classified and analyzed. This shows that the K-means clustering algorithm has the advantage of simple calculation and at the same time reflects the characteristics of the algorithm's excellent clustering effect and obvious classification. Reference [35] proposed a hybrid algorithm that combines CS, PSO, and K-means. Then PSO and K-means were utilized to generate new nests in standard CS for better results. Combining the strengths of each algorithm to form an effective data clustering method yields more optimal results.
According to the above research description, this paper proposes a method of line selection by combining VMD-PE and K-means clustering algorithm when a single-phase grounding fault occurs in a neutral point grounding system with a small resistance. The main contributions of this paper are as follows: (1) A new hybrid algorithm is proposed. This method avoids modal mixing caused by harmonics and interference caused by noise, improves the characteristic accuracy of the signal, and can better identify fault lines under different transition resistances, grounding resistances, and fault distances. (2) This method can accurately identify fault line selection, classify the extracted fault characteristic signals, and perform regression verification, with an accuracy rate of 100%, which has a certain use value. (3) Finally, the accuracy of the method is verified by collecting real experimental field data, and, at the same time, it is compared with other signal processing methods, and the method in this paper is proved to be better through data analysis and verification.
The rest of this paper is arranged as follows: Section 2 mainly describes the relevant algorithm principles, to better understand the conceptual relationship between variables, and also introduces the model solving algorithm and process of this paper. Section 3 is the analysis and decomposition of the single-phase grounding fault model in this paper, followed by the decomposition, classification, and verification of the fault characteristic signal. In Sections 4 and 5, the real test field is used to verify the accuracy of the method proposed in this paper, and then the proposed method is compared with other methods to verify the superiority of the method in this paper.
2. Algorithm Principle
2.1. VMD Principle
The VMD algorithm can effectively handle linear and nonlinear signal decomposition. When extracting the fault signal using VMD, K center frequencies as and modal function are defined first. Then the intrinsic mode function (IMF) is used as an AM-FM signal, and its calculation formula is
In formula (1), and .
Secondly, the bandwidth frequency of K modal components is calculated. Each intrinsic mode function is transformed by Hilbert transform, and the corresponding central frequency is added to the transformed component signal. Then, the central spectrum of each modal component is corresponding to its fundamental frequency bandwidth. Finally, the square norm of the gradient of the transformed signal is calculated to obtain K corresponding modal signal bandwidths.
In formula (2), ; ; is the pulse function.
The quadratic penalty factor and the Lagrange multiplication operator are introduced to calculate the optimal solution as
In formula (3), is the penalty parameter; is Lagrange multiplier.
Finally, through multiple iterations of the ADMM algorithm, using the Parseval/Plancherel Fourier isometric transformation, formula (2) is transformed into the frequency domain to get the following formula:
The value expression of equivalent center frequency is shown as follows:
After the center frequency is updated, it is converted to the frequency domain to get the following formula:
In summary, initialize , , and , constantly update (4) and (6) to obtain new and , update (2) and (3), repeat iteration until K modal components are obtained, stop calculation, and finish.
When using the VMD algorithm to decompose the signal, the number of modes K is determined first. Different K values produce different decomposition values for the decomposition results. Generally, the number of decompositions is between 3 and 6. An unsuitable K value is likely to cause a larger error in the final judgment result. In [36, 37], when the number of modal decompositions is small, some waveform signals in the original signal will be filtered out and lost, and it is easy to cause modal aliasing; when the number of signal decompositions is large, the frequency center changes of adjacent modal components are small, which is not only difficult to distinguish but also prone to frequency aliasing. In order to determine the appropriate decomposition modal value K in VMD, according to the description in [38], this paper constructs a fault signal containing high and intermediate frequency components and noise as an example for analysis. The specific formula is as follows:
In formula (7), is the frequency signal with high intensity and low frequency, is the frequency signal containing the medium-strength intermediate frequency, is the frequency signal with weak high frequency, is the frequency signal containing the attenuation component, and is the noise signal.
Applying VMD to the constructed signal , for K = 2, 3, 4, the decomposition diagram is shown in Figure 1.

(a)

(b)

(c)
As can be observed from the above exploded view that when the decomposition value is 2, the signal is not completely decomposed; the corresponding noise signal part is not decomposed and the phenomenon of modal aliasing occurs. When the decomposition value is 3, IMF1 is the corresponding power frequency component and attenuation component, IMF2 is the corresponding high-frequency component, and IMF3 is the corresponding noise signal, so the decomposition is more reasonable. When the decomposition value is 4, overdecomposition occurs, and frequency aliasing occurs in IMF3 and IMF4. Therefore, when the number of modal decompositions K = 3, the high-frequency, low-frequency, and noise signals contained in the fault signal can be effectively separated, and the advantage is more obvious.
2.2. Principle of Permutation Entropy Calculation
Construct a time matrix sequence matrix about , and after phase space reconstruction, get the reconstructed time matrix:
In formula (8), embedding dimension is m, ; the time delay is .
Rearrange the components of the reconstructed time matrix in ascending order to get the following formula:
For any reconstruction vector in the reconstructed phase space, a symbol sequence can be obtained which reflects the size order of its elements, where l = 1, 2, , n. The probability of occurrence of each symbol sequence is the sequence . Therefore, for the constructed time series matrix , the probability composition permutation entropy (PE) of the occurrence of each symbol sequence is defined as follows:
When , reaches the maximum value . Normally, can be standardized by ; that is,
Here, 0 ≤ PE ≤ 1. The closer the PE is to 0, the more stable the time series. The closer the PE is to 1, the less stable the time series.
When calculating the permutation entropy, the embedding dimension m and the delay time t in the permutation entropy are related to the permutation entropy value. Reference [39] shows that when the insertion dimension m < 3, the state quantity of the reconstructed signal will be lost, the sensitivity to the shock signal will be greatly reduced, and the algorithm will fail; when the insertion dimension m > 7, the reconstruction of the signal in the phase space easily leads to a homogenized time series. Therefore, the general size m is chosen to be 3–7. Reference [40] introduced that the selection of delay time has little effect on the value of permutation entropy, but when it is greater than 5, it is difficult to extract small changes in the fault signal; usually 1 can be used. At this time, an AM/FM signal (formula (12)) is randomly selected for permutation entropy calculation, and the N value is selected and discussed. According to [23], the length N is set to 128, 256, 512, 1024, and 2048 for discussion, as shown in Figure 2.

According to the principle that the smaller the PE value, the more regular the time series, as shown in Figure 2, the PE values of the data length N = 2048 are all less than 0.5 and can show better arrangement rules than other data lengths. When m = 6 and 7, the minimum permutation values are basically close. If the insertion dimension m is too large, it will not only take a long time to calculate but also make it difficult to show subtle changes in the sequence. So when m = 6 is the best choice, this paper sets m = 6, τ = 1, and N = 2048.
2.3. K-Means Clustering Analysis Principle
Cluster analysis is an unsupervised learning method that can process a large amount of data and information, mainly to solve the problem by the distance of the initial fault cluster. However, if the data is directly used for K-means clustering analysis, large differences in values and dimensions may occur, so the data needs to be preprocessed before clustering.
The Z-score normalization method is used to preprocess the original data, and the transformation formula is show in the three following formulas:
Here, is the mean value of the signal data; is the original signal data; S is the standard deviation of signal data; is the normalized preprocessed signal data; and N is the number of pieces of signal data.
K-means algorithm is based on distance similarity classification; the main algorithm process is as follows:(1)Construct the initial signal matrix and set the number of classes of K.(2)Calculate the sum of the distances from each signal point to the cluster center as an objective function. In formula (16), ; ; ; is the cluster center of the i-th category; is the i-class sample set after clustering.(3)Continue to update calculation (16) until convergence, and get a new cluster center point.
2.4. Model Building with Solving Algorithms and Flowcharts
Step 1. Build a small resistance grounding system, using FTU to extract fault line and nonfault line current and voltage data.
Step 2. The extracted zero-sequence current signal is decomposed by VMD to obtain the intrinsic mode functions IMF1, IMF2, IMFi of each line.
Step 3. The intrinsic mode function IMFi of each line is calculated by permutation entropy, and the corresponding fault phase permutation entropy value and nonfault phase permutation entropy value are obtained. Then it constitutes a signal matrix.
Step 4. The matrix is preprocessed firstly and then input into the K-means clustering algorithm to identify the fault line and nonfault line. Distance from the fault point of each line to the center of the fault line cluster and distance from the center of the nonfault line cluster are calculated. If is less than , the fault occurs on this line.
The algorithm’s flowchart is shown in Figure 3.

3. Simulation Analysis of the Single-Phase Ground Fault Signal
3.1. Model Analysis of the Neutral Point through Small Resistance Grounding Fault
Figure 4 is a simplified low-resistance grounding system. The ground capacitances of lines 1, 2, and 3 are , , and , respectively. Neutral grounding of 10 kV bus through grounding transformer (excluding the transformer resistance) [41] and the grounding resistance is . is not only the neutral point voltage but also the bus zero-sequence voltage; and the current coming out from the voltage bus to grounding through the line is the positive direction. Assume that a phase of system line 3 has a single phase to ground fault, and the transition resistor is , so Figure 5 is a zero-sequence equivalent circuit diagram.


From Figure 5, the capacitances to ground are , , and , and the neutral ground resistances are connected in parallel to obtain the zero-sequence impedance.
In formula (17), the zero-sequence capacitance to ground is the power frequency. Therefore, the bus zero-sequence voltage can be obtained by the following formula:
From Figure 5, is the zero-sequence current at the fault point, , is the zero-sequence current at the neutral point, and is the zero-sequence current of each line i. Besides that, , , and are normal voltages of line 1, line 2, and line 3, respectively. is the voltage of three lines, also virtual power for fault point. Due to existence of zero-sequence voltage at neutral point of bus, which makes the bus voltage offset, the three-phase voltages , , and of the bus after grounding fault are expressed in the following formula:(1)The nonfaulty lines mainly include line 1 and line 2, so their zero-sequence current is equal to the zero-sequence capacitance current of this line to ground. We have the following formula:(2)Line 3 is a fault line, and its zero-sequence current is equal to the sum of all nonfault lines and the neutral point through low-resistance current. We have the following formula:
According to the analysis of formulas (19) and (20), the zero-sequence currents on nonfault lines are basically the same, but there are still large differences compared with the zero-sequence current values on fault lines.
3.2. Fault Feature Decomposition and Entropy Calculation
In this article, the Simulink simulation module is used to establish a 10 kV low-resistance grounding system model in MATLAB. There are three lines in the 10 kV low-resistance grounding system, positive- and negative-sequence parameters are R = 0.27 Ω, L = 2.55 × 10−4 H, and C = 3.39 × 10−7 F, and zero-sequence parameters are R0 = 2.7 Ω, L0 = 1.109 × 10−4 H, and C0 = 2.8 × 10−7 F in the three lines. The fault occurred in phase A of line 3, with neutral grounding resistance of 15 Ω and line lengths of 9 km, 8 km, 8 km, and 5 km, respectively, and the start time of the fault simulation was 0.1 s.
3.2.1. Decomposing Fault Signals Using VMD
Under the condition of metal ground fault, when the neutral grounding resistance RN = 15 O and the ground fault distance L = 8 km, the intrinsic mode function diagram of zero-sequence current VMD is shown in Figure 6.

(a)

(b)

(c)
3.2.2. Calculating the Entropy Value of the Fault Signal
Line 3 is set to have phase A grounding fault. Because grounding resistance of the neutral point is set within 20 O through the low-resistance grounding system and the single-phase grounding fault current is restricted between 400 A and 1000 A [42, 43], the grounding resistance of the neutral point is set to be 5 O and 15 Ω, respectively. It has been known that the main reasons for short circuit are as follows: component damage such as the aging of the equipment insulation material, meteorological conditions and human destruction, and other factors cause different degrees of grounding fault [44]. Therefore, the fault values of different grounding resistances are set to metal grounding, 50 Ω, 100 Ω, 500 Ω, 1000 Ω, 2000 O, and arc grounding resistance with high resistance. The fault distances are 8 km and 9 km. The entropy calculation results are shown in Table 1.
It can be concluded from Table 1 that, under different fault characteristics, the modal function of the faulty phase and the modal functio of the non-faulty phase are calculated by permutation entropy, and different entropy values of the inherent modal function are obtained. It can be concluded that the scale entropy of the faulty phase line is lower than that of the nonfaulty phase line, and the fault characteristics are obvious.
3.3. Calculating the Cluster Centers of Faulty and Nonfaulty Clusters
3.3.1. Analysis of Clustering Simulation Results
The simulation experiment data in Table 1 are preprocessed first, and then the zero-sequence current grounding fault matrix for line selection is constructed, and then the matrix is analyzed by K-means clustering. The cluster analysis results are shown in Figure 7.

The entropy matrix constructed by decomposition of zero-sequence current obtained by simulation model is clustered into nonfaulty line cluster and fault line cluster in three-dimensional rectangular coordinate system. After calculation, the clustering center of faulty line cluster is C1 = (−1.378, −1.377, 0.306), and the clustering center of nonfaulty line cluster is C2 = (0.689, 0.689, −0.153).
3.3.2. Experiment Verification of Fault Protection Technology for the Small-Resistance Grounding System
The above clustering analysis results were verified by regression, and the analysis results are shown in Table 2.
It can be concluded from Table 2 that when line 3 has phase A grounding fault, all the lines can be correctly identified according to the distance between the fault feature vector of each line and the clustering center of fault cluster and nonfault cluster.
4. Analysis of the Verification Results of the Cluster Analysis Test in the Real Test Field
4.1. Analysis of Fault Signal Line Selection in the Real Test Field
According to the analysis and processing of the simulation data in a real test field, the VMD-PE decomposition of the collected data in the real test field is performed to obtain the arrangement entropy value, as shown in Table 3.
It can be seen from Table 3 that the scale entropy of the faulty phase in the real test field is smaller than that of the nonfaulty phase, and the fault characteristics are obvious.
From Figure 8, it can be observed that the zero-sequence current obtained from a true test field is decomposed and built into an entropy matrix, which is clustered into nonfaulty line clusters and faulty line clusters in a three-dimensional rectangular coordinate system. The fault phase line cluster center is C1 = (0.232, −1.376, −1.379), and the nonfault phase line cluster center is C2 = (−0.116, 0.688, 0.689).

4.2. Analysis of the Verification Results of the Cluster Analysis Test in the Real Test Field
The data in Table 4 shows that when a single-phase ground fault occurs on phase A of line 1, the fault line and nonfault line can be correctly identified by calculating the distance from the fault feature vector of each line to the cluster center of the fault cluster and the nonfault cluster.
5. Comparison of Method Superiority
In order to verify the superiority of the line method proposed in this paper, a single-phase grounding fault occurs on line L1 of the real test field as an example. When the fault resistance is metal grounding, O = 500 and Ω = 1000, the fault distance is 5 kM, and the neutral point resistance is 10 Ω, the algorithm proposed in this paper is compared with EMD-PE and K-means algorithm, EEMD-PE and K-means algorithm, and HHT-PE and K-means algorithm for line selection, respectively. The comparison results are shown in Table 5.
From Table 5, it can be summarized that when the grounding resistance is a metallic grounding resistance, the four methods can select the line normally. However, with the increase of the fault resistance value, method 2, method 3, and method 4 are prone to aliasing phenomenon, and it is impossible to distinguish the faulty from nonfaulty lines normally. The VMD algorithm can effectively decompose the fault current and is less affected by the harmonic components and noise in the signal. At the same time, using permutation entropy to calculate its value can better highlight the fault characteristics. Finally, K-means algorithm is used to correctly distinguish faulty lines. Therefore, the algorithm proposed in this paper is superior to other algorithms.
6. Conclusions
Aiming at the single-phase grounding fault of the neutral grounded system with low resistance, a new method of fault line selection using VMD with PE and K-means clustering algorithm is proposed. In this method, the fault signals generated by different transition resistances, different grounding resistances, and different fault distances are decomposed by VMD to obtain inherent modal components. Then the permutation entropy of the inherent modal components obtained from the decomposition is calculated and input into the K-means clustering as a feature vector to identify and distinguish faulty lines from nonfaulty lines. Through the comparative analysis of the real test field, the following conclusions are drawn:(1)The VMD method is applied to the single-phase ground fault signal analysis of the low-resistance grounding system to avoid the modal aliasing of the decomposition signal. This method adaptively decomposes the signal into modal components with a certain bandwidth and frequency and also provides a rich information data source for identifying faulty lines.(2)The permutation entropy is easier to calculate and more robust against noise. Combination with VMD can enhance the characteristics of fault signals and better identify faults. The analysis of fault signal and nonfault signal shows that this method can highlight the fault characteristics of single-phase grounding of neutral point through small resistance and can correctly select the fault line.(3)It is proposed that, in the case of different transition resistances, different grounding resistances, and different fault distances, inputting a matrix of permutation entropy values calculated based on the VMD and PE algorithms into K-means clustering can verify the results: the fault line selection method combining VMD with PE and K-means clustering algorithm can correctly identify faulty lines and nonfaulty lines, which has certain practical application value.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Key Scientific Research Projects Plan of Henan Higher Education Institutions under Grant 19A470006 and the Cultivation Plan of Young Backbone Teachers in Colleges and Universities of Henan Province under Grant 2019GGJS104.