Abstract

High-resistance ground faults are difficult to detect with existing ultrahigh voltage direct current (UHVDC) transmission fault detection systems because of their low sensitivity. To address this challenge, a straightforward mathematical method has been proposed for fault detection in UHVDC system based on the downsampling factor (DF) and approximation derivatives (AD). The signals at multiple sampling frequencies were analysed using the DF, and the AD approach was used to generate various levels of detail and approximation coefficients. Initially, the signals were processed with different DF values. The first, second, and third order derivatives of the generated signals were calculated by the AD method. Next, the entropy features of these signals were computed, and the Random Forest-Recursive feature elimination with cross-validation (RF-RFECV) algorithm was used to select a high-quality feature subset. Finally, an ensemble classifier consisting of Light Gradient Boosting Machine (LightGBM), K Nearest Neighbor (KNN), and Naive Bayes (NB) classifiers was utilized to identify UHVDC faults. The MATLAB/Simulink simulation software was used to develop a ±800 kV UHVDC transmission line model and perform simulation experiments with various fault locations and types. Based on the experiments, it has been established that the suggested approach is highly precise in detecting several faults on UHVDC transmission lines. The method is capable of accurately identifying low or high resistance faults, irrespective of their incidence, and is remarkably resistant to transitional resistance. Furthermore, it exhibits excellent performance in identifying faults using a small sample size and is highly reliable.

1. Introduction

Transmission technology based on UHVDC was extensively utilized for long-distance power transmission due to its high transmission capacity, flexible control, and low losses. With recent advancements in power electronics, UHVDC transmission has become more stable and easier to control, offering a broader development prospect. However, the complex geographical environment and challenging working atmosphere of UHVDC transmission lines lead to high fault probabilities, which seriously affect the efficiency of power transmission. Hence, quick and accurate fault identification is crucial to eliminate the impact of faults and ensure the continued proper functioning of the entire transmission system [1, 2]. To address these issues, Chinese and foreign scholars have conducted extensive research. Currently, fault identification primarily relies on the variability of electrical quantities at the time of failure, with less application of artificial intelligence methods.

Faults in a two-terminal DC system can be detected by using the travelling wave differential current [3]. While calculating the instantaneous power, integral value of the traveling wave has been recognized by some scholars as a fault detection technique [4]; its application in long-distance UHVDC systems is impractical due to the strict data requirements. The authors of [5] theoretically analysed the features of the differences in the reactive energy of transmission lines during internal and external faults. Then, they proposed the use of the opposite polarity of reactive power at both ends of the DC line during internal faults and the same polarity during external faults to achieve fault discrimination. The rate at which the bus voltage and current change was used by the authors to suggest a fault identification technique [6], however adjusting the parameters for identification is difficult. In reference [7], the authors integrated the amplitude-frequency properties of DC filters with the harmonic equivalent circuit of the HVDC transmission system to achieve fault discrimination by exploiting the polarity contrast of the harmonic currents between the internal and external faults.

Numerous studies have attempted to employ machine learning techniques for HVDC system failure detection, as advancements in statistics and machine learning technology continue to progress [8, 9]. In reference [10], the characteristics of DC voltage and DC current are extracted using principal component analysis (PCA), and then these features are trained and tested using support vector machine (SVM) for fault identification and classification. The authors of [11] proposed a fault identification technique based on K-means that utilized voltage and current information from the inverter of a two-terminal HVDC system to identify faults. However, this technique may not be suitable for other transmission systems. While the aforementioned machine learning methods may perform well under specific circumstances, their efficacy may be limited when the system structure changes.

In recent decades, deep learning has developed rapidly. There are significant advances being made in the disciplines of image, voice, and natural language processing [12], and several researchers have used them to achieve defect detection. In reference [13], the fault signals were processed by empirical mode decomposition (EMD), and the results of these decompositions were applied to train the convolutional neural network (CNN) model. Then, the trained model was validated with test data to locate faults. The authors of [14] applied the Hilbert–Huang transform to obtain a time-frequency energy matrix for the signal, which was then used to create a 2-D image. This image was then fed to a CNN model to identify faults. In reference [15], the authors proposed modified complete ensemble empirical mode decomposition with adaptive noise (MCEEMDAN) algorithm that can decompose electrical signals into intrinsic mode functions (IMFs). By using the pseudocolor coding of the IMF, a grayscale image can be generated from the original signal, and the conditional generative adversarial network (CGAN) algorithm was employed to create additional samples to enhance the dataset. Finally, a CNN model was used to identify different fault types. Deep learning algorithms are particularly adept at extracting deep features from vast amounts of data without the need for a specific model. Each signal processing technique has benefits and drawbacks of its own, but they all have a high computational complexity that makes fault diagnosis less effective. The performance of the neural network is highly dependent on its parameters, and tuning as well as training neural networks for power systems with different structures can be a time-consuming process.

To address the issues above, a technique for identifying faults in UHVDC that combines Downsampling Factor (DF) and Approximate Derivatives (AD) with ensemble learning was proposed. To begin with, the DF (DF is a positive integer) was utilized to analyse the signals at different sampling frequencies. The AD is a simple signal processing method, and various levels of detail and approximation coefficients for the signal were obtained by it. Then, the entropy features of these signals were calculated, and a subset of high-quality features were filtered with the RF-RFECV algorithm. Finally, the three base classifiers (LightGBM, KNN, and NB) are integrated into an ensemble classification model for fault identification by a voting rule. Fast recognition and overcoming the lack of performance of a single weak classifier are two benefits of the ensemble classification model [16]. The article’s primary contributions include the following:(1)A new UHVDC fault signal analysis method based on DF-AD was proposed. Compared with the traditional signal analysis method, the DF-AD method can provide more classification information to the classifier and has higher efficiency.(2)A UHVDC fault identification model based on voting rule and ensemble learning was proposed. Three weak classifiers (LightGBM, KNN, and NB) were integrated into an ensemble classifier by using a voting rule. The final ensemble model identification result is better than that of a single classifier.(3)In the process of identifying faults, the RF-RFECV algorithm is employed to intelligently engineer features without the need for manual screening.

This paper is organized as follows: Section 2 describes the downsampling algorithm based on downsampling factor and approximate derivative method. Section 3 introduces the related content of feature extraction, feature selection, and ensemble classifier. Section 4 explains the simulation model and fault simulation method. Then, Section 5 discusses the fault identification results. Finally, the conclusion is presented in Section 6.

2. Signal Processing Method

2.1. Down-Sampling Factor

Down-sampling based on DF is a basic signal processing method. By altering the downsampling factor, signals with different sampling frequencies may be produced. It is possible to express the signal’s sampling frequency using the formula:where represents the signal’s sampling frequency, N represents the number of the signal’s data points, and T represents the signal’s duration. The DF-based downsampling method is the fundamental signal processing technique. It is utilized to lower the signal’s sample frequency. We are able to decrease the sample proportion by keeping the initial sample as well as every nth sample following the first sample.

We assume that the length of a signal is 16, . If the downsampling factor n = 1, , the length of the signal is 16. If downsampling factor n = 2, , the length of the signal is 8. If downsampling factor n = 4, . The length of the signal is 4. Although the number of data points varies in the three cases mentioned above, the signal maintains a constant duration, such as in the signals above, which can be assumed to last for 0.01 seconds.

When n = 1, the length of the signal is 16, N = 16, T = 0.01 s.

When n = 2, the length of the signal is 8, N = 8, T = 0.01 s.

When n = 4, the length of the signal is 4, N = 4, T = 0.01 s.

As can be seen from the above equation, we can obtain signals with different sampling frequencies by varying the size of the DF value. Different numbers such as “1, 2, 4, 8, …, m” could be used for the DF. Table 1 provides comprehensive details on sample frequency application with different DF values. The signal lasted 0.1 seconds, as was previously indicated. In addition, the signal was composed of 1280 data points and its sampling frequency was 12.8 kHz. The signal duration was always 0.1 s. However, the data and sampling frequency changed according to different DF values. For each case, the sample frequency was computed by equation (1). In this analysis, we assume DF values of 1, 2, 4, 8, 16, 32, 64, and 128. Specifically, when DF equals 1, the resulting signal is equivalent to the original signal. Therefore, the signal’s sampling frequency range is between 0.1 kHz and 12.8 kHz.

As can be seen in Figure 1(a), the voltage signal collected at the positive electrode of the rectifier side was taken as an example, when the positive electrode grounding fault occurs in the UHVDC transmission system. In addition, Figure 1 also depicts the analysis and presentation of the multiple frequency ranges of the voltage. However, the approximate coefficient of the signal may not provide enough classification information for the classifier. Therefore, different fault signal detail coefficient levels must be determined in order to identify UHVDC faults. The various detail and approximation coefficients of the signal were determined by the AD method.

2.2. Approximate Derivative

More detailed information can be obtained by calculating the approximate derivative of the signal [17, 18]. We assume that there is a vector with k elements. Then, the AD method computes the differences between neighbouring k elements. If  = approximate derivative (), where the matrix is the AD of , then we can write as . can be expressed as follows:

It is possible to take several derivatives of any one-dimensional matrix. Taking the AD of matrix F twice, we can get .

The dimension of is k, the dimension of is , and the dimension of is . The elements of the vector diminish as the order of derivatives increases.

In Figure 2, the AD method was used to process the voltage signal (12.8 kHz) in Figure 1(a). The approximation coefficients and detail coefficients of the signal were determined. Similar results can be obtained for different UHVDC fault types. In the meanwhile, it can be seen from Figure 2 that similar figures are obtained after getting the third-order approximate derivative. Thus, this paper selects the voltage signal on the rectifier side during the UHVDC faults and the first three orders of approximate derivatives as the research object to determine the classification features.

Combining the downsampling factor with the AD method, a signal can be decomposed into multiple signals for analysis, thereby obtaining more classification features. As shown in Figure 3, the voltage signal was decomposed into 8 signals (1, 2, 4, 8, 16, 32, 64, and 128) according to different DF values. And the first-order approximate derivative, the second-order approximate derivative, and the third-order approximate derivative of the signals under each DF value were obtained, respectively, and a total of 32 signals (8  4) were obtained.

3. Classification Method

When the input sample has more features, the efficiency of the classifier will decrease. Therefore, we choose the RFECV algorithm based on a random forest to select the high-quality features in the data set.

The goal of the supervised learning approach to machine learning is to acquire a model that is robust and effective in every respect. However, the actual circumstance is frequently less than ideal. Occasionally, we can just get a number of preferred models (some weakly supervised models just do well in some ways). Using ensemble learning, many poorly supervised models are combined to produce a better, more complete, and more accurate strongly supervised model. The error of a single weak classifier can be corrected by the ensemble model. This is the potential of ensemble learning. This section explains the feature extraction, RF-RFECV feature selection algorithm, and intelligent classifiers (LightGBM, KNN, and Bayesian) and proposes ensemble classification models employed in this study.

3.1. Feature Extraction

Providing useful information to a classifier is the primary goal of feature extraction. In this research, nine signal entropy features of the signals are computed to acquire the classification information. The faulty signal is decomposed into 8 levels according to 8 DF values. Then, one downsampling signal and three derivative signals generated by the AD method were obtained. Thus, there are 32 (8  4) signals obtained from the original signal. Nine entropy values were used to derive a total of 288 (9  32) classification features. Table 2 displays specific information on the nine entropy features. Where stands for the data of S, N stands for the overall quantity of signal’s data points. And stands for the signal’s average value, and the signal’s standard deviation is represented by . The feature extraction approach yielded 288 features for the diagnosis of UHVDC fault classes.

3.2. Recursive Feature Elimination with Cross-Validation

Some features might not be required for the fault identification, and longer periods would be needed for training as a result. To get the outstanding classification features, the RF-RFECV feature selection technique was employed.

RFECV is a feature selection algorithm based on Recursive Feature Elimination (RFE) and Cross-Validation (CV), primarily used to screen high-quality features in high-dimensional data [19]. The RFE method requires a machine learning algorithm to assess how important the various features are, and this paper selects the random forest (RF) model as the RFE feature importance estimator. There are two primary stages involved in the RFECV algorithm. The first stage is RFE based on random forest, which is employed to choose the best collection of features in a high-dimensional dataset. In the RFE stage, the mean decrease impurity (MDI) was applied to assess the significance of every feature [20]. Then, the least important one is eliminated until all features have been evaluated for importance [21]. The second stage is CV, where the importance of all features is ranked based on the RFE stage, and CV is performed on different sets of features to choose the optimum amount of features that have the best average score.

We can suppose that the sample dataset consists of data x and label y. The initial feature set R contains all the features of the data. This paper selects five-fold cross-validation to obtain the optimal feature combination. Table 3 displays the pseudo-code of the RF-RFECV algorithm [22].

3.3. Ensemble Classification Models
3.3.1. LightGBM

In 2017, Ke first proposed the LightGBM algorithm [23], which is based on the gradient boosting tree technique. The basic idea of LightGBM is to acquire an optimal model by integrating several weak classifiers. LightGBM mainly includes leaf growth strategy and histogram algorithm [24]. This tactic searches through all of the leaves in the same layer at each iteration to determine which ones offer the greatest advantage from being split, and then it divides the chosen leaves [25] into different groups. It is simpler to include leaf-wise generation into parallel training, which can improve accuracy. The histogram algorithm can merge mutually exclusive sample features and store mass data during the traversal process, thereby reducing the memory consumption of the algorithm. Compared with the original GBDT, the calculation speed is improved by ten times. More information about LightGBM can be found in references [26, 27].

3.3.2. K-Nearest Neighbor

KNN is a straightforward machine learning algorithm that identifies unknown sample class distributions by the nearest k training samples [28]. It operates based on the distances between sample locations and the allocated set of nearest neighbor points. Several distance metric functions are employed to identify the closest neighbors [29]. A majority class of nearest k is allocated for the additional instances in the case of a greater number of nearest neighbors. The classifier’s output forecast is typically computed by a given parameter k and majority voting rule based on the neighbor class [30]. The maximum probability of in association with class for the presented test scenario () issue of KNN may be expressed as follows:where demonstrates the likelihood of in class .

The KNN was deployed with a variety of searching approaches during the training process to accelerate the goal of locating the closest neighbors. The KNN classifier proposed in this study uses a default value of 1 for the parameter k and combines a linear search strategy with the Euclidean distance metric function in order to get results.

3.3.3. Naive Bayes

The Naive Bayes (NB) algorithm is a traditional method in machine learning that categorizes information by using Bayes’ theorem for probability functions [31]. During the training phase, the algorithm supposes that each feature pair to be classified is self-governing of the others. In the prediction phase, the likelihood of a certain class can be deduced based on the test sample. The simplicity, computational efficiency, and ease of implementation of NB classifiers have led to their widespread use in classification tasks [32, 33]. The follow equation defines the Bayes theory that is utilized in NB [34]:where indicates the posterior probability; indicates prior probability; indicates the possibility of result; and indicates the predictor prior probability.

In this study, features extracted from different types of fault data are used to train and test Bayesian classifiers. The Bayesian classifier uses the highest output probability of each category to make a prediction of the fault type.

In this study, the features extracted from different types of fault data were used to train and test Bayesian classifiers. The Bayesian classifier uses the highest output probability of each category to make a prediction of the fault type.

3.3.4. Proposed Ensemble Classifier Model

To enhance the capacity of individual weak classifiers to generalize, many practitioners rely on a voting ensemble approach. This involves using an ensemble learning method with two-stage classification [35]. The following sections will detail the development and concepts behind these proposed ensemble classifiers.

The ensemble classifier’s voting technique integrates the outcomes of several different classifiers and is thought to be better than a single classifier. The proposed ensemble classifier model based on the voting approach has a structure that contains two stages. The initial stage of the classification process involves training various base classifiers (LightGBM, KNN, and NB) and predicting the results with the test data. In the subsequent meta-stage, the voting technique is utilized to integrate the predictions of several base classifiers, resulting in a hybridized prediction. The ultimate class labels, from L1 to L7, are decided by applying the ensemble model that was suggested. The efficiency of the ensemble model may be improved by selecting an appropriate voting technique. The “average of probability” voting technique was used for the meta-level classification as part of the proposed ensemble classifier model. The following is an explanation of the suggested method for the voting ensemble classification model:Step 1: The 1050 samples were split evenly between the training and testing data, with a ratio set at 7 : 3. Then, the base classifiers () were trained during the first stage. For the suggested ensemble model, we choose LightGBM, KNN, and NB as base classifiers.Step 2: To calculate the following stage in classification, the predictions generated by the trained base classifiers (, and ) are taken into account. The predictions of each base classifier can be used to express a probability distribution vector that includes 7 class results from L1 to L7 of the classifier for a given dataset D, and the can be expressed as [35]:Step 3: The base classifiers generate probability distribution vectors, which are then aggregated and averaged by using the voting rule (meta-level classifier). This rule is known as the “average of probability” voting combination rule [36], which can be represented as follows:where represents the probability distribution of every base classifier in dataset D, N defines the number of basic classifiers, and is the probability distribution class of the ensemble classifier sorted by voting rule.

3.4. Performance Evaluation Indices

The following performance index (PI) has been utilized to assess the properties of the ensemble learning classification described in this paper.

Kappa Statistics (KS): The kappa coefficient is a measure of consistency in statistics. When it comes to classification problems, the consistency refers to the degree to which the predicted outcomes of the model match the actual classification results. The kappa coefficient, which is generally greater than zero and falls within the range of −1 to 1, is computed from the confusion matrix. The KS value indicates the performance of the classifier. If KS = 1, the classifier exhibits outstanding performance; if the KS value ranges from 0.4 to 0.75, the performance of the classifier is good. And the performance of the classifier is considered poor when the KS index is lower than 0.4. The KS index can be calculated by the following equation [37]:

In the above equation, represents the observed fault type and represents the expected fault type.

Mean Absolute Error (MAE): The MAE can be calculated by the predicted and observed results of classifier. And it can be stated by the equation below [38]:where represents the predicted results and represents the observed results of the classifier for a particular set of sample data n.

Precision (P): The Precision measures how well the model can separate positive and negative samples. A greater precision indicates that the model performs better at this classification task [39].where represents true positive samples and represents false positive samples.

Recall (R): The ratio of accurately predicted positive observations to all observations in the class may be used to indicate the recall, and it can be shown as follows:where indicates false negative samples.

F1 score (F1): Precision and recall are combined to create the F1 score. The categorization model is more reliable as the F1 score is higher.where indicates the precision and indicates the recall.

3.5. Identification Process

The UHVDC fault identification framework based on the DF-AD and the ensemble learning is shown in Figure 4. There are primarily four parts to the model structure. All experiments were carried out under MATLAB2019a and Python 3.6 programming.(1)Several subsignals are obtained from different types of UHVDC fault data by the DF-AD method. For example, a fault voltage signal of UHVDC system can obtain 32 subsignals after 8 times downsampling and 3 times approximate derivative derivation.(2)For each subsignal obtained, nine entropy features shown in Table 2 are calculated. Therefore, a total of 288 features can be obtained from 32 subsignals.(3)The Algorithm 1 RF-RFECV was used to select important features, which is convenient for the ensemble classifier to classify faults. In this paper, the Algorithm 1 RF-RFECV is used to obtain 12 high-quality features from 288 feature sets.(4)Through step (3), a data set with 12 high-quality features can be obtained. The size of the data set is 1050  12, and it is divided into training set and test set in a ratio of 7 : 3. The training data was utilized to train the ensemble classifier, and the test data was utilized to validate the model’s performance. Finally, the fault identification is realized.

4. Simulation

4.1. Simulation System Parameters

To validate the efficacy of the fault diagnosis approach suggested in this paper, an ±800 kV line commutated converter-based (LCC) UHVDC system was built in MATLAB/Simulink as illustrated in Figure 5. The system has a rated power of 8000MVA, a rated voltage of 800 kV, operating at a frequency of 50 Hz, and features a total transmission line length of 1095.7 km, while utilizing a 12-pulse bridge circuit for the converter.

4.2. Different Types of Fault Simulation

Seven different types of failures were simulated on the basis of this model. Figure 5 illustrates the types of faults that might manifest in the UHVDC transmission system under actual conditions. The f1 indicates a positive ground fault; f2 indicates a negative ground fault; f3 indicates a short circuit fault between poles. And the f4 indicates a A-phase ground fault on the rectifier side of the AC system; f5 indicates a A-phase ground fault on the inverter side of AC system; f6 indicates a positive ground fault on the rectifier side, and f7 indicates a positive ground fault on the inverter side.

The DC voltage of different types of faults is collected as the analysis signal, the sampling frequency of the data is set to 12.8 kHz, the duration is 0.1s, and each signal has 1280 sampling points. Table 4 describes the sampling data for various fault parameters of UHVDC model. For each internal fault, this paper considers 10 fault resistances of 1, 50, 100, 200, 350, 500, 650, 800, 950,1100 Ω, and 15 fault locations from the rectifier side 70∼1050 km (step size of 70 km). Therefore, each internal fault has 150 (10  15) samples. This study considers 100 transition resistances of 6∼600 Ω (step size of 6 Ω) and 50 transition resistances of 610∼1100 Ω (step size of 10 Ω) for every external fault and AC system fault. Hence, for every external fault and AC system fault, there are 150 (100 + 50) samples. For faults on DC lines, different fault samples can be obtained by changing the size of transition resistance and fault location. Similarly, each type of fault contains 150 samples. In this work, we collected a total of 1050 (150  7) fault samples. These samples were processed by the DF-AD method used in this paper, and a data set with a size of 1050  288 was obtained. The data sets are then randomly separated into a training set and a test set in proportions of 7 : 3. The table demonstrates how defect sample data sets are generated.

5. Results and Discussion

5.1. Results of RFECV Feature Selection

Through the IF-AD algorithm and different entropy features introduced above, we can acquire a total of 288 classification features. In spite of there being numerous features, not all of them are appropriate for classification purposes. The aim of Algorithm 1 RF-RFECV is to identify the features that are most efficient for classification. The Algorithm 1 RF-RFECV-based feature selection algorithm is applied to the resulting data set containing 288 features, and 12 features are selected to identify faults as shown in Table 5.

5.2. Classification Results: Ensemble Classifier

To increase the accuracy of fault identification and guarantee the system’s safe and stable functioning, this study proposed an ensemble classification model, which is composed of three basic classifiers: LightGBM, KNN, and NB, aiming at detecting and classifying various UHVDC faults. In this model, the basic classifiers (LightGBM, KNN and NB) are first trained by using the entropy features extracted from different types of UHVDC fault signals. Then, the predicted values of the basic classifier are combined with the voting method by the meta classifier to obtain new anticipated class values during the final stage of decision making. In this work, in order to effectively classify the classifier, the samples in the training and testing stages are randomly selected, while the training set has 735 samples. To assess the effectiveness of the suggested ensemble classification model based on its identification accuracy and performance index, the outcomes of suggested model are contrasted against those of every individual base classifier. The identification accuracy can be interpreted as the proportion of the UHVDC faults properly identified out of the total number of UHVDC faults and may be represented as follows using equation:

Tables 68 display the confusion matrices for LightGBM, KNN, and NB classifiers, and Table 9 displays the confusion matrix for the ensemble model. The primary diagonal element in the confusion matrix represents correctly classified test samples, whereas off-diagonal elements signify misclassifications. Table 10 displays the performance indicators of each classifier.

The UHVDC faults are classified by the basic classifiers (LightGBM, KNN and NB) and the ensemble classifier model. Tables 69 show the confusion matrix of the classification results of the four classifiers. From the results, compared with the single basic classifier, the ensemble classifier based on the voting rule possesses higher accuracy (99.05%) in UHVDC fault identification. Table 10 shows the performance indicators of the three base classifiers and the integrated classifier. The KS values of the three base classifiers are 0.9407, 0.9629 and 0.8592, respectively, while the KS value of the ensemble classifier is increased to 0.9889, indicating the stability of the ensemble classifier model was improved. Compared with the three basic classifiers, the MAE value of the ensemble classifier is lower, indicating the classification error of the ensemble classifier is lower. Similarly, as evidenced by Table 10, the ensemble classifier exhibits superior performance to the basic classifier with respect to the precision, recall, and f1 score measures.

To assess the robustness of the ensemble classifier, we examine the data with varied noise content, and the result is shown in Table 11.

According to the performance indicators of the classifier and Table 11, the ensemble classifier has better performance and more powerful anti-interference capabilities than basic classifiers in fault identification.

5.3. Comparison between DF-AD and Discrete Wavelet Transform

In this section, to demonstrate the DF-AD method’s effectiveness, a comparison is made with the commonly used signal processing method Discrete Wavelet Transform (DWT). The computational complexity of DF-AD and DWT is measured by calculating the time taken to process the same data, with the experimental data from the same computer.

Applying DWT to the voltage signal in Figure 6(a), eight detailed and one approximate signal were obtained. Figure 6 depicts the energy entropy for the eight detailed coefficients, but it only yields one distinctive curve through the 8-level DWT of the fault voltage signal. However, as shown in Figure 7, using the suggested DF-AD approach, four attribute curves were constructed for the fault voltage signal. Therefore, the DF-AD method used in this paper can analyze more characteristic curves of the signal. Different UHVDC fault types have different entropy values. As may be observed from Figure 6(b), the measured voltage signals have different entropy values when the UHVDC system has faults such as positive pole grounding, negative pole grounding and two-pole short circuit. The fault types of UHVDC are classified by entropy difference. As shown in Figure 8, with various DF-AD parameters, it is available to attain the comparable results.

The times required to analyze the fault data with DWT and DF-AD for different fault types were shown in Tables 12 and 13. By performing DWT analysis on the fault voltage signals, it is possible to obtain an approximate signal and eight detail signals. In total, nine different signals are obtained. While DF-AD method obtains a total of 32 signals, so more information can be obtained from the original voltage signal.

Although the signal obtained by the DF-AD signal processing method is more than four times that of the DWT method, the calculation time of this method only needs 1/6 of DWT method, as shown in Tables 2 and 3. This is mainly because the DF-AD method only requires basic arithmetic operations.

5.4. Comparison with Other Methods

In the environment of a 30 dB signal-to-noise ratio, Table 14 presents the results of comparing the method presented in this paper with fault identification techniques from other literatures.

Table 14 indicates that, despite the influence of noise interference, the proposed method continues to exhibit superior accuracy compared to other methods.

6. Conclusions

In this study, we propose a fault identification model that utilizes the DF-AD approach and ensemble classifier to accurately classify seven common fault types in UHVDC. To simulate the faults, we built a ±800 kV model in MATLAB/Simulink. The voltage signal from the rectifier side is processed using our proposed DF-AD approach, which extracts more precise information and is faster than DWT. For the generated several subsignals, from which we calculate nine types of entropy to identify UHVDC faults. High-quality features were selected using the Algorithm 1 RF-RFECV, and the ensemble classifiers were trained using a voting rule. The experimental results show that our proposed fault identification model shows higher accuracy and robustness to transition resistance when compared to other methods. In future work, we plan to add more fault types and investigate more efficient recognition models.

Nomenclature

DF:Down-sampling factor
AD:Approximation derivative
RF-RFECV:Random forest-recursive feature elimination with cross-validation
LightGBM:Light gradient boosting machine
KNN:K-nearest neighbor
NB:Naive bayes
PCA:Principal component analysis
SVM:Support vector machine
EMD:Empirical mode decomposition
CNN:Convolutional neural network
MCEEMDAN:Modified complete ensemble empirical mode decomposition with adaptive noise
IMFs:Intrinsic mode functions
CGAN:Conditional generative adversarial network
DWT:Discrete wavelet transform
FFT:Fast Fourier transform
GAF:Gramian angular field
DBN:Deep belief network
ANN:Artificial neural network.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by National Natural Science Foundation of China (grant no. 51777132), National Natural Science Foundation for Young Scholars (grant no. 51907138), and the Science and Technology Project of State Grid Shanxi Electric Power Co., LTD, China (grant no. 520510220002).