Abstract

Aiming at the problems of time-consuming and low accuracy in the existing state evaluation methods of distribution equipment, a state evaluation method of distribution equipment based on health index in big data environment is proposed. Firstly, in order to optimize the time-consuming of big data analysis on large-scale and distributed clusters, a distribution equipment condition monitoring data platform in big data environment is designed, and a hive based relational online analysis method (ROLAP) is proposed. Secondly, the health index (HI) is introduced as the evaluation index to evaluate the health status of distribution equipment. According to the different influence degree of different fault factors on the equipment status, a comprehensive multifactor fault rate correction model is obtained, and the method based on success flow is used to solve the model to improve the accuracy of state evaluation. Finally, experiments show that when the data volume of distribution equipment is 60 GB, the time of the proposed method is only 30.0 s, which is far lower than 73.6 s and 82.5 s of the comparison method. The evaluation accuracy of the proposed method is 95.1%, while the evaluation accuracy of the comparison method is only 82.4% and 73.1%, respectively. Therefore, the proposed method can effectively improve the efficiency of distribution equipment condition evaluation.

1. Introduction

As an important part of smart grid construction, distribution network undertakes the responsibility of directly undertaking intermittent loads such as end users, distributed power generation, and electric vehicles. However, due to its complex operating conditions, uneven equipment level, and complex network structure, it is impossible to form a reliable structure similar to power generation and transmission system [1, 2]. Research shows that more than 85% of power failures occur on the distribution network side [35]. Therefore, the condition evaluation of distribution equipment is not only helpful to improve the efficiency of daily management of power system but also helpful to the construction of smart grid and the application of modern distribution technology [69].

British power grid researchers compared the operation status of power grid equipment with human health and proposed the concept of electrical equipment health index by using the relevant theoretical knowledge of human health index (HI) and analogy with the deterioration law of electrical equipment [1012]. For the fear that the extended service will increase the failure rate of the power grid, damage the reliability of system power supply, and have an adverse impact on users, it is necessary to formulate a more reliable maintenance plan of power grid equipment based on it. Subsequently, this concept was introduced into the research of power supply reliability, asset management, and many other fields, which is helpful to the risk control of power grid and avoid the defects of blind maintenance and excess maintenance of equipment. To a certain extent, it reduces the operation and maintenance cost, optimizes the resource allocation, and improves the security of power system operation and power supply reliability [1315].

Traditional power grid data processing technology can achieve good results for small and medium-sized data, such as power grid computing, P2P computing, and cluster computing. The online monitoring data of power equipment has the characteristics of large amount of data, various types of values, multiple changes, and low value density [16, 17]. As a new computing model framework, cloud computing is more suitable for processing massive online monitoring data of distribution equipment [1820].

At present, although China has carried out relevant research on the health status evaluation of distribution equipment, most of the domestic research is focused on high-voltage transformers. At present, distribution equipment health evaluation not only lacks systematic theories and tools but also its research has not been extended from the equipment level to the network level. How to evaluate the health status of a large number of distribution equipment and complex real-time dynamic distribution network is not only a new topic in the development of modern power grid but also a great challenge in the development of modern smart grid. Reference [21] proposed an equipment fault prediction method based on a similar density array, which uses logistic fast minimum error entropy algorithm to predict equipment fault risk considering weather factors. Aiming at the complex mechanism of secondary equipment fault in intelligent substation, reference [22] combines analytic hierarchy process and antientropy theory to propose a state evaluation method of secondary equipment based on weighting method. Reference [23] uses entropy weight method to normalize each index, calculate the index weight, and finally form a comprehensive evaluation model. Reference [24] proposes a state sensing method of intelligent substation secondary system based on FCE and deep convolution neural network (DCNN), but training and optimization will occupy more resources. According to the standard and expert experience, the condition evaluation standard is scored in reference [25], and the state is divided into three levels in the form of health index and maintenance task. Reference [26] comprehensively considered hardware information and human factors combined with a fuzzy iterative method and weighted expert database and proposed a multi-source information fusion state evaluation model. Reference [27] established the distribution network health index model, proposed the combination of network equivalence method and goal-oriented method, and used the GO method in reliability analysis to analyze and solve the model.

However, in the context of big data, the above methods often have the problem of low accuracy, and big data management also has the problems of time-consuming, poor storage performance, and low analysis efficiency. Aiming at the problems existing in the above methods, a state evaluation method of distribution equipment based on health index in big data environment is proposed. The innovations are as follows:(1)A relational online analysis method based on hive is proposed, and a distribution equipment condition monitoring data platform in the big data environment is designed, which improves the efficiency of the method when performing big data analysis on large-scale and distributed clusters.(2)HI is introduced as a good or bad index to evaluate the health status of distribution equipment, and the method based on success flow is used to solve the model, which improves the accuracy of status evaluation.

2. Design of Distribution Equipment Condition Evaluation Architecture under Big Data

Traditional equipment condition evaluation mostly adopts traditional data storage and analysis methods, which cannot carry out complex analysis on the collected data. The proposed method integrates distributed data storage and big data analysis, which brings a new idea to the state evaluation of distribution equipment. The proposed framework is shown in Figure 1.

2.1. Data Acquisition

The data acquisition layer collects the data through state access controller (CAC) and sensors, and transmits the collected data to the state access network shutdown (CAG) through web services. Due to the complexity of data source types, it is necessary to use Sqoop, an open-source tool, to ETL (extract, transform, and load) the required data, and then use a unified structure to store the data after data association and aggregation. After completing the tasks of query, calculation, and statistical analysis, Sqoop can also export the analyzed results to the external relational database my SQL for users to view. Due to the huge amount and complex types of distribution equipment data, Sqoop needs to be used to extract, convert, and load. By cross sharing the data of multiple information management systems, analyzing massive multi-source heterogeneous data, mining, and analyzing the state quantity indicators with strong correlation, such as equipment defects, equipment faults, and equipment key health status

2.2. Data Storage

The data storage layer integrates HDFS and MySQL. Big data with unified and standardized status monitoring is stored in the distributed file system HDFS. MySQL is mainly used to store the model information of distribution equipment condition monitoring and manage Hive metadata.

2.3. Data Analysis

In the data analysis layer, a distributed ROLAP analysis method for distribution equipment condition monitoring data is designed. ROLAP service supports larger user groups and data volume, which is often used in occasions with high requirements for these capacities. Hive is a kind of distributed ROLAP services. It completes operations such as roll up and drill down through the MapReduce decomposition task. Parallel computing mainly depends on MapReduce architecture. Figure 2 is MapReduce data analysis flow chart.

Taking the big data of distribution equipment as the research object, the MapReduce architecture is analyzed and processed according to the following steps:

Step 1. The big data of distribution equipment is randomly divided into several data blocks such as split0, split1, and split2. Each data block is assigned to the Map node in the form of < key1, value1 > key value pairs for parallel task processing, and these Map tasks are carried out at the same time;

Step 2. After the map task is completed, many intermediate results in the form of < key2, value2 > key value pairs are generated. The intermediate results are sent to the shuffle process for aggregation and processing. The key value pairs with the same key value are formed into a cluster and transmitted to the Reduce node in the form of key value pairs < key2, value2 >;

Step 3. Start the reduce task. According to the key value pairs < key2, value2 > passed from the Shuffle process to the Reduce node, the final sorting operation is carried out for the key value pairs with the same key value to form the key value pair result of < key3, value3 >;

Step 4. The calculation results of the reduce node are summarized and output as the final results.
In order to better realize computing localization, the slave node data node of HDFS and the slave node task tracker of MapReduce are fused and bound to a slave node, which is convenient to directly read data locally for analysis and processing and complete the specific interpretation of the data within the data processing time.

2.4. Data Presentation

The data presentation layer integrates the functions of statistical query, auxiliary decision-making, data mining, and so on. At the same time, it also provides the analysis, evaluation and prediction functions of various distribution equipment status information and provides condition monitoring data for other related systems.

3. State Evaluation Method of Distribution Equipment

3.1. Comprehensive Multifactor Failure Rate Correction Model

Distribution network equipment is mainly divided into five categories: overhead line, cable, distribution transformer, disconnector, and circuit breaker, and the failure rate of the category equipment is expressed by . Through the different influence degree of different fault factors on the equipment state, combined with the operation state and working environment of the equipment, the correction coefficient matrix of various equipment fault rates is obtained [25]. The weights of three types of fault factors and the correction coefficients of equipment faults are shown in Table 1.

The weight matrix of various equipment fault factors obtained by analytic hierarchy process is shown in the following equation:where represents the weight of class equipment failure caused by the factor, .

The failure rate correction coefficient matrix of various equipment iswhere is the correction factor.

3.1.1. Overload Correction

Generally, the equipment is allowed to operate under heavy load and overload for a short time, but the longer and higher the degree of heavy load and overload operation, the higher the equipment failure rate. When the equipment load rate does not exceed the rated load rate , the visual equipment failure rate is zero. The definition is used to characterize the overload degree of the equipment. At the same time, combined with the definition of definite integral and exponential function, the overload correction coefficient is calculated as follows:where is the real-time load rate, is the maximum allowable load rate, and is the overload correction factor.

3.1.2. Failure Factors of Strong Wind and Heavy Rain

The correction factor for the fault factors of heavy wind and heavy rain of distribution equipment is , which can be obtained according to the average precipitation data of recent years counted by the local meteorological department. The calculation formula is as follows:where represents the month of the year to be evaluated, and , , and represent the monthly average precipitation.

3.1.3. Lightning Strike Correction

The correction factor of meteorological failure factors of lightning strike of distribution equipment is , which can be obtained according to the average lightning strike data in recent years counted by the local meteorological department. The calculation formula is as follows:where refers to the average monthly lightning stroke times in the month of the year to be evaluated, refers to the average monthly lightning stroke times in the month of the year to be counted, and refers to the average monthly lightning stroke times in the month of the year of the year to be counted.

According to the equipment failure factor weight matrix and equipment failure rate correction coefficient, a comprehensive multi factor failure rate correction matrix can be obtained:where represents the comprehensive correction coefficient.

3.2. Equipment Health Index Model

The actual equivalent service life of matching equipment is calculated according to the service age fallback theory, and the Weibull distribution parameters are estimated according to the collected data to obtain the equipment failure rate . Then, using the weight matrix and correction matrix of equipment fault factors, the correction coefficient of corresponding equipment is calculated, and then corrected. The real-time health index of the equipment can be expressed as follows:where is the curvature coefficient, is the proportion coefficient, and is the comprehensive correction coefficient of distribution equipment failure rate. Because the statistical data of power companies in different regions are different, it is difficult to obtain the actual common failure rate data. The scale coefficient and curvature coefficient in this paper are determined by reference [26]. According to the above formula and the and of various equipment, the values of proportion coefficient and curvature coefficient of transformer, cable, overhead line, and other equipment can be obtained, and then the functional relationship between equipment HI and failure rate can be obtained.

3.3. Solving Model Based on Success Flow Method

GO can well reflect the logic and functional relationship between components and systems. It is an analysis method based on the system structure diagram and aimed at the successful operation of the system. It is mainly used for system simulation.

When the GO method is introduced into the reliability evaluation of distribution network, the fault rate data of equipment are preprocessed first, so that the fault rate data of all equipment appear in the form of probability. Then, with the help of success oriented technology, each user can delete various fault combinations according to the state dependency of the equipment and obtain the reliability index of the equipment. When the equipment failure rate is , the successful operation probability of the equipment is converted to .

The health index and failure probability satisfy a kind of exponential relationship, that is, the failure probability is

The health index of components is transformed by GO to obtain the probability of successful operation:where is the health index of the element in the branch feeder of the layer.

According to the obtained system hierarchy diagram, each branch feeder is analyzed in series from the lowest level, and an equivalent successful operation probability is used to replace the successful operation probability of all equipment on the same line:

If some important users use double circuit lines, parallel calculation shall be adopted, that is,wherein is the successful operation probability of the first loop line in the double loop line, and is the successful operation probability of the second loop line.

Put the calculated result of the layer as an equivalent element into the layer, as shown in Figure 3.

The probability of successful operation of components is calculated as follows:where is the successful operation probability of equivalent components, is the proportion of the branch feeder in the total load of the layer, and .

Repeat the above process until it reaches the top level of the network to obtain the successful operation probability , which is inversely transformed into the health index. The transformation formula is

4. Experiment and Analysis

4.1. Construction and Configuration of Big Data Platform

The main process of building and configuring the big data platform of distribution equipment is as follows:(1)Install Ubuntu on the server to complete the implementation of virtual Linux under Windows environment, and determine the parameter configuration of virtual machine according to the data analysis requirements of power distribution equipment;(2)Count up and down on Ubuntu, install JK, and configure Java Home and other relevant parameters as needed to establish a Java environment;(3)Download and install Hadoop and create platform users;(4)Configure SSH to ensure shared access between nodes and ensure security during remote management;(5)Modify the configuration file one by one, including modifying the global configuration file (conf/core site. xml), yarn configuration file (conf/yarn site. XML), and creating and modifying the MapReduce configuration file (conf/mapred site. xml) to complete the configuration of the analysis platform environment;(6)Format the HDFS system, start the platform environment, ensure the normal operation of each node of the platform through multiple tests, load the data to be analyzed, update the storage and output directories, and complete the integration of data flow and the platform. Figure 4 shows the construction and configuration process of big data platform.

After the above main steps, the distribution equipment big data analysis platform has been preliminarily built. The main software version parameters used are as follows: operating system: Ubuntu 12.04; Hadoop version: hadoop-2.7.2; JDK version: jidk-7u79-jinux-i586.

4.2. Structure Diagram of Simulation System

The simulated radial distribution system structure IEEE RBTS-Bus2 feeder 4 system is shown in Figure 5. The distribution network includes 12 distribution lines, 19 equipment such as 7 transformers, 1 circuit breaker, and 7 terminal load points. The failure rate of circuit breaker is 0.002 times/year and that of distribution transformer is 0.015 times/(a.km). The failure rate of each load point is shown in Table 2. Among them, load points 16 and 22 are commercial loads, with an average load of 0.559 MW. Load points 17–19 are residential load, with an average load of 0.492 MW. Load points 20 and 21 are secondary loads of government departments, with an average load of 0.587 MW. The length of lines 26, 31, 33, and 36 is 0.9 km, the length of lines 27, 29, 32, and 35 is 0.75 km, and the length of lines 28, 30, and 34 is 0.65 km.

4.3. Influence of Device Health Index Change on Network Health Index

Select the equipment on the same layer and different layers, respectively, and simulate the state of the equipment on each layer from good to bad (the corresponding health index changes from 1 to 8). Figure 6 shows the impact of equipment degradation on the network health index at the same layer, and Figure 7 shows the impact of equipment degradation at different layers on the network health index. Even if the health index of equipment is the same, different levels in the system have different effects on the overall health index of distribution network. The higher the level of equipment, the greater the contribution of its health state to the health index of distribution network. The reason is that the higher the level of equipment in the system, the more serious the loss caused by the failure. If equipment 26 (overhead line) fails, it will directly affect all load points connected with it, that is, it will have the greatest impact on the network. Compared with high-level equipment, the failure of low-level equipment T21 will only affect the load point LP21. Therefore, the higher the level of equipment health in the network, the higher the impact on the overall network.

4.4. Comparison of Different Methods

Figure 8 shows the running time consumption of different methods when the amount of data changes. It can be seen that the time of abnormal detection of power equipment is different for different amounts of data. When the amount of data is 10 GB, Reference [26] takes 20.0 s, and Reference [27] takes 25.3 s. The time of the proposed state evaluation method of distribution equipment based on health index in big data environment is only 8.5 s. When the data volume is 60 GB, Reference [26] takes 73.6 s, the method of Reference [27] takes 82.5 s, and the time of the proposed method is only 30.0s. This is because this paper divides the index types, modifies the failure rate model by integrating multiple factors, designs the distribution equipment condition monitoring data platform under the big data environment, and proposes a relational online analysis method based on Hive, which improves the analysis efficiency of the method when performing big data analysis on large-scale and distributed clusters, so the state evaluation takes less time.

The accuracy rate is the proportion of correctly evaluated samples to all samples. In order to verify the accuracy of the proposed method, Reference [26] method and Reference [27] method are compared with the proposed method. Figure 9 shows the comparison of evaluation accuracy under different methods. Under different data sets, this method has the highest evaluation accuracy. When the data volume of distribution equipment is 10 GB, the evaluation accuracy of the proposed method is 22.1%, the evaluation accuracy of Reference [26] method is 22.3%, and the evaluation accuracy of Reference [27] method is 10.0%. When the data volume of distribution equipment is 60 GB, the accuracy of the proposed method is 95.1%, the evaluation accuracy of Reference [26] method is 82.4%, and the evaluation accuracy of Reference [27] method is only 73.1%. The proposed method considers the influence of different fault factors on the state of equipment, introduces the health index as the index to evaluate the state of distribution equipment, and uses the method based on success flow to solve the model, which improves the accuracy of state evaluation.

5. Conclusion

Aiming at the problems of time-consuming and low analysis efficiency of existing methods in distribution equipment condition evaluation under the background of big data, a distribution equipment condition evaluation method based on Hi under the environment of big data is proposed. The condition monitoring data platform of distribution equipment in big data environment is designed, and a relational online analysis method based on hive is proposed to improve the efficiency of big data analysis. Hi is introduced as the evaluation index of the health state of distribution equipment, the fault rate model is modified by multiple factors, and solved by GO method. This method can effectively improve the efficiency of distribution equipment condition evaluation.

Limited by the current laboratory hardware conditions and the scale of experimental data obtained, the data set used in the experiment only reaches the scale of GB. In the next step, we are going to carry out experimental research on parallel analysis of TB data. In addition, when establishing the distribution network health index model, this paper selects the widely used radial distribution network so that the follow-up research can take the ring network and other complex networks as the research object.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.