Abstract
During the power grid system maintenance and overhaul, real-time detection of the insulators and drop fuses is important for the live working robots in the distribution network to plan motion. The visual system of the robot needs object detection algorithms with high detection precision, fast speed, and robustness to image brightness changes. In this paper, the improved YOLOv4 is proposed for detecting the insulators and drop fuses based on the YOLOv4. The improved YOLOv4 extracts features of power components through convolutional neural networks (CNN) and then performs feature fusion. After feature extraction and fusion, the algorithm generates prediction boxes based on anchor boxes that are clustered by fuzzy C-means algorithm (FCM) instead of K-means algorithm to detect the objects. Finally, the nonmaximum suppression algorithm (NMS) is used to obtain the final prediction results. In order to detect small targets, the improved YOLOv4 is added to a larger detection layer. For enhancing the robustness of the algorithm, data augmentation methods are carried out to enrich the data set. Combining the improvements, the test results show that the improved YOLOv4 gets higher accuracy and faster detection speed compared with the other detection algorithms based on deep learning. The mean average precision is 97.0%, and the average detection time is 0.012 s. Therefore, the improved YOLOv4 is suitable for the live working robots in the distribution network to detect the insulators and drop fuses fast and accurately.
1. Introduction
A reliable and stable power system can ensure social stability and economic development [1]. The rapid growth of power components and transmission lines has increased the difficulty and workload of power grid system maintenance and overhaul [2–4]. In order to ensure a stable power supply and reduce economic losses, live work such as non-stop maintenance and component replacement on high-voltage live components is carried out. Fault detection develops rapidly and is wildly used to detect faults in power components. However, at present, the traditional manual electrification work is still the main method in China, which is extremely dangerous to solve the problems resulting from the faults [5]. It is very important for the live working robot in the distribution network to replace the manual live working operation, which has attracted the attention of a large number of researchers [6, 7]. Figure 1 shows the self-control flow chart of the live working robot. In the self-control phase, the robots need classification and position information of the power components to realize autonomous obstacle avoidance function and live work tasks. The visual system detects the power components to provide the classification information and the position information in the image coordinate system by using the detection algorithm. Combining the detection results and reconstructed 3D scene, the position of the power components in the robot base coordinate system can be obtained. Therefore, a real-time and high-precision detection algorithm is the key technology for the live working robots to keep the stability of the operation and maintenance of the power grid system.

In recent decades, the applications of computer vision and artificial intelligence in live work have improved the quality of live work. Many researchers have contributed to the recognition and location of power components by extracting manual features such as texture, color, and contour of power components. Li et al. [8] designed a pulse-coupled neural filter based on the color and parallel characteristics of the transmission lines, combined with the improved Hough transform to detect the transmission line on the aerial image, but this method was sensitive to the environmental background. Lin et al. [9] extracted the common areas as the regional coordinates of the insulator after the color feature and contour feature of the insulator were merged. The color feature extraction was greatly interfered by light intensity, and the application of this method may be limited. Li et al. [10] derived the contour feature characteristics based on the vertical contour projection curve to search the position of the insulators and trained the support vector machine to recognize the insulators. However, this method relied on repeated shape patterns on the insulators that was easily occluded by other power components. Oberweger et al. [11] proposed a detection algorithm based on discriminative training of local gradient-based descriptors and a subsequent voting scheme for localization. The requirement of the algorithm was that the insulator caps should be clear, which was not met under severe conditions. In summary, traditional object detection algorithms need sufficient prior knowledge, low background interference, and suitable light intensity conditions to identify and locate power components [12]. Moreover, in order to achieve higher classification accuracy, manual feature extraction work was very complicated [13].
In the new century, deep learning has been wildly used and made amazing breakthroughs in the field of object detection [14]. Some excellent object recognition algorithms based on convolutional neural networks have emerged. These object detection algorithms are divided into two- and one-stage object detection algorithms. Firstly, the two-stage object detection algorithms select the regions of interest from the images through the selective-search algorithm. Then the regions are input into the convolutional neural network to extract CNN features that are to be input into support vector machine for classification. Finally, bounding-box regression modifies the candidate frame positions to obtain recognition and location of objects. Classic two-stage algorithms are region-based convolutional neural network method (R-CNN) [15], fast region-based convolutional neural network method (Fast R-CNN) [16], faster region-based convolutional neural network method (Faster R-CNN) [17], and so on. One-stage object detection algorithms realize object classification and position through logistic regression based on CNN features extracted by a convolutional neural network. Classic one-stage algorithms are “You Only Look Once” series (YOLOv1, YOLOv2, YOLOv3, and YOLOv4) [18–21] and Single Shot MultiBox Detector (SSD) [22]. Anchor-based object detection algorithms such as the YOLOv2, YOLOv3, and YOLOv4 need anchor boxes to provide prior knowledge about width and height information to generate bounding boxes for detecting objects. Instead of choosing anchor boxes by hand, a clustering algorithm is run on the training set to obtain cluster centres as the anchor boxes.
Several researchers have applied deep learning to the recognition and positioning of power components. Guo et al. [23] proposed a method to detect the insulators in UAV images based on depth learning with a long training time and low detection accuracy. Lei and Sui [24] presented a method to locate insulators accurately based on the faster R-CNN, while the detection speed was not fast enough for the live working robots. Wang et al. [25] presented a method for the detection of railway insulators based on adaptive cascaded CNN with high precision but a long detection time. Jiang et al. [26] realized the accurate segmentation of bushings in infrared images through mask-RCNN. This method could be better if the objects to be divided were multicategories, and this method was not suitable for RGB images with more complex background information. Ling et al. [27] combined faster R-CNN and U-net to realize the accurate location of broken insulators. Sadykova et al. [28] proposed the in-yolo that trained YOLOv2 network with a data-augmentation image data set in order to avoid overfitting. Although the in-yolo realized real-time detection, it required images with an uncluttered background. Liu et al. [29] introduced a method that applied the YOLOv3 to recognize and locate insulators for the first time and analyzed the detection results of the algorithm under different brightness conditions, target occlusion, and target cutting off. Meanwhile, the YOLOv3 lacked ability to detect small objects in the images. The above research shows that it is difficult for the above methods to realize real-time detection while keeping high accuracy under the conditions of complex background. Compared with other popular object detection algorithms, the YOLOv4 shows higher accuracy and faster detection speed [21].
The suspension insulators and power components in the distance are small and obscured easily in the images. Daylight changing, foggy weather, and snowy weather sometimes occur during the power grid system maintenance and overhaul. In order to solve the problems, the improved YOLOv4 is proposed based on the YOLOv4 to detect power components such as drop fuses, suspension insulators, and rod insulators. The contributions of this paper are shown in the following part of the paragraph. Firstly, the improved YOLOv4 is added one yolo layer with feature map to detect small objects to improve the accuracy of detection. Secondly, brightness changing, adding fog and snow effects in the images, and random cropping are applied for increasing the richness of the data set and enhancing the robustness to daylight changing and extreme weather. Most methods choose anchor boxes by running the K-means algorithm on the training set. Finally, in this paper, K-means, K-means++, and fuzzy C-means clustering algorithms (FCM) are applied to the selection anchor boxes to improve the accuracy of the detection results.
The remaining parts of this paper are shown as follows: Section 2 introduces the improved YOLOv4 network structure and algorithm principle; Section 3 focuses on the implementation of the power components recognition and location; Section 4 analyzes the advantages of the improved YOLOv4 through comparative experiments; and Section 5 makes a summary and outlook for the entire article.
2. Methodology
This section introduces the principle of the YOLOv4, image acquisition and annotation, and the improvements to the YOLOv4.
2.1. Principle of the YOLOv4
The YOLOv4 combines lots of tricks such as cross-stage partial connection (CSP) [30] module, Path Aggregation Network (PANet) [31] module, Mish activation [32], Distance-Intersection over Union-Non-Maximum Suppression (DIoU-NMS), and Complete IoU Loss (CIoU-Loss) [33], making the YOLOv4 one of the best object detection algorithms. This part introduces the detection process, network structure, and postprocessing of network output.
2.2. Detection Process of the YOLOv4
The network structure of the YOLOv4 consists of backbone, neck, and head. The backbone is a CSPDarknet53 [30] structure, which extracts the features from the images. The neck part composed of SSP (spatial pyramid pooling) [34] and PANet takes full advantage of the features extracted from different layers. Head part has three yolo layers, generates bounding boxes based on anchor boxes, and obtains detection boxes containing the information of the objects size, object location, and probability of each class through logistic regression. The network structure and detection process of the YOLOv4 are shown in Figure 2. The detection process of the YOLOv4 after feeding images to the network is shown as follows: Step 1: The backbone extracts the features from the images. Step 2: The neck part fuses features from different layers of the backbone. Step 3: The head part including three yolo layers divides each image into grid. Each cell from the grid generates three bounding boxes based on the anchor boxes to detect the objects. Step 4: DIoU-NMS algorithm is used to filter the bounding boxes to obtain the final detection results.

2.2.1. Backbone
Backbone is a CSPDarknet53 structure, which contains 11 blocks consisting of convolution combining with batch normalization and Mish activation (CBM) and 5 CSP blocks. The Mish activation function is a highlight of the YOLOv4, improving the stability and final accuracy of model training. The Mish activation function is non-monotonic and has infinite continuity and smoothness in order to stabilize the network gradient flow and allow better information to penetrate into the network. The expression of the Mish activation function is shown as follows:
The CSPNet module is a very important part of the network structure. The module structure is shown in Figure 3. It integrates gradient changes from beginning to end into the feature map, improving the learning ability of convolutional neural networks, reducing the amount of calculations, and greatly shortening training time.

2.2.2. Neck
Connecting backbone and head, neck part is used to fuse feature maps from shallow layers full of edge shape features and deep layers containing semantic information. The neck part consists of an improved SPP module and a PANet module. The improved SPP is used to increase the receptive field of the features and separate the most important context features. The YOLOv4 adds PANet, in order to preserve more shallow features, beneficial to detecting small objects and reducing the amount of calculation. The PANet module is shown in Figure 4.

2.2.3. Head
The head part realizes multiscale detection to recognize and locate the power components. It consists of three yolo layers with different scales (, , and ) that are the prediction feature maps modified by CBL blocks and convolution operations after feature fusion by neck part. As shown in Figure 5, the prediction feature maps contain the predictions of the input images. The yolo layer divides each image into grid, and each cell generates three bounding boxes that contain the coordinates of the centre points (, ), the width and height of the bounding boxes (, ), the object confidences (), and class confidences () of the bounding boxes. The size of each feature map is shown as follows:where , , , and . According to expression (2), it is inferred that the number of bounding boxes generated from each yolo layer is .

2.3. DIoU-NMS Algorithm
There are many wrong and repeated predictions among a large number of bounding boxes obtained from yolo layers. DIoU-NMS algorithm filters these predictions and gets high-quality predictions as the detection results are based on the DIoU value instead of IoU value. IoU value is the ratio of the overlapping area of two bounding boxes to the sum of the area of two bounding boxes except for the overlapping area. DIoU value focuses on not only the area of two bounding boxes but also the distance of their centre points. The higher the DIoU value indicates, the higher the similarity between two bounding boxes. The schematic diagram of DIoU is shown in Figure 6 and the expressions of DIoU and IoU values are shown as follows:where represents the distance between the centre points of two bounding boxes and is the diagonal length of the minimum bounding area of the two bounding boxes.

Based on DIoU value, the process of filtering bounding boxes by the DIoU-NMS algorithm is shown as follows: Step 1: Delete bounding boxes whose object confidence is less than the class confidence threshold. Step 2: Calculate the class confidence score for category by the following expression. Sort the bounding boxes from highest to bottom according to the scores and select the bounding box with the highest score as a correct prediction box. Step 3: Calculate the DIoU values between the remaining bounding boxes and the correct prediction box and delete bounding boxes with DIOU values greater than the NMS threshold. Step 4: For the remaining bounding boxes, repeat the operations from step 1 to step 3 until there are no bounding boxes left. The correct prediction boxes are the final predictions.
Using DIoU values, the DIoU-NMS algorithm reduces the probability of deleting the correct predictions if the correct predictions are too close to each other.
2.4. CIoU Loss for Bounding Box Regression
In the training process, the detection algorithms use logistic regression to make the bounding boxes approximate the size and position of the ground truth. The YOLOv4 uses CIoU loss instead of mean square error of the offsets to perform bounding box regression, achieving better convergence speed and accuracy on the bounding box regression problem. CIoU value takes aspect ratio, overlap area, and central point distance into consideration, and the CIoU value is shown as follows:
Here, is weight function and is used to measure the similarity of the aspect ratio. The definition of CIoU loss is as follows:
2.5. Image Acquisition and Annotation
The image data were collected at State Grid Intelligent Co. Ltd. on November 17, 2020. The DJI Mavic Air 2 drone is equipped with a CMOS image sensor with a video resolution of and a frame rate of 30fps to shoot power components from multiple angles and directions. The shooting distance was about 3 meters. A total of 1,500 RGB images with the resolution of were extracted from videos by using OpenCV-3.4.
To improve the robustness of the network identification, image augmentation methods were used to enrich the data set. In order to enhance the ability of the detector to recognize and locate incompletely displayed objects, 200 pictures were randomly cropped to increase the number of cut-off objects in the data set. Randomly crop sizes were and . The aim of changing the brightness of the images is the robot control system can get high-precision detection results through the YOLOv4 when the lighting conditions are poor. We randomly selected 250 pictures and adjusted the brightness of the images to 0.8, 0.7, 0.6, 0.5, 0.4, and 0.3 of the original brightness. Foggy or snowy weather is the challenge for the detection of power components. As shown in Figure 7, we used Automold to add fog and snow effects to the images for simulating foggy and snowy weather. After data augmentation and deleted useless images, there were 2,718 images in the data set, including 2,174 images in the training set, 418 images in the test set, and 126 images in the verification set. The objects in the images were annotated as the ground truth (gt) and saved as label files by using labelImg annotation tool.

(a)

(b)
2.6. Improvements to the YOLOv4
2.6.1. Selection of Anchor Boxes
In yolo layers, each cell generates three bounding boxes based on the prior knowledge that are the width and height of the corresponding anchor boxes to detect the objects. Therefore, the prior knowledge should accurately describe the feature of the width and height of the objects in the users’ own training set. The clustering algorithms are run on the training set to calculate cluster centres as the anchor boxes based on the width and height of ground truth. K-means algorithm and K-means++ algorithm are the most popular clustering algorithms to calculate anchor boxes. However, the initial cluster centres have a huge impact on the final cluster centres. Both of them are hard clustering algorithms measuring the probability that the sample belongs to each cluster by 1 or 0. The measurement of the probability could easily cause the samples to belong to the wrong clustering. The FCM algorithm is more flexible than hard clustering algorithms by using a membership matrix instead of 0 and 1 to measure the probability. The FCM algorithm for anchor boxes selection is shown as Algorithm 1. Expressions (7)–(9) in the Algorithm 1 are shown below. Instead of Euclidean distance, IoU loss measures the distance between ground truth when clustering algorithms are run. The IoU loss is shown as expression (8). Neural network-based deep clustering methods such as deep clustering network (DCN) [35] and structural deep clustering network (SDCN) [36] are also employed to select anchor boxes.
|
K-means algorithm, K-means++ algorithm, FCM algorithm, and the deep clustering methods are run on the training set to choose the cluster centres as anchor boxes. 6, 9, and 12 anchor boxes are obtained through these clustering algorithms and the average IoU values between the ground truth and the corresponding cluster centres are shown in Table 1. The average IoU value of anchor boxes chosen through the FCM algorithm is much higher than the anchor boxes chosen through other algorithms. The deep clustering methods are not suitable to select anchor boxes on this training set for the low average IoU value. This finding indicates that the FCM algorithm chooses the more suitable anchor boxes to provide prior knowledge for bounding boxes. Moreover, the average IoU values grow with the number of anchor boxes increasing.
2.6.2. The Structure of the Improved YOLOv4
The raw images are resized into the input images with scale, while the first yolo layer with scale obtains 8 times downsampled feature map. Therefore, it is difficult for the first yolo layer to detect objects smaller than pixels. The distribution map based on the width and height of the ground truth to input images size is shown in Figure 8. According to the distribution map, there are a certain number of dots distributed in the square. Hence, the first yolo layer is not suitable to detect smaller objects.

In order to solve this problem, the improved YOLOv4 is proposed by fusing more feature maps from the shallow layer and adding a yolo layer with scale. The improved YOLOv4 network structure is shown in Figure 9. Compared with the original YOLOv4 network structure, three parts have been added to the improved structure. The first part is the feature map from the backbone, containing more location and detailed information of small objects. The second part is added to the PANet module to fuse feature map from the shallow layer and output , , , and feature map for the prediction part. The third part is the yolo layer with scale for obtaining the feature map from the PANet module to detect the small power components. As the network gets deeper, 16 CBL blocks in the PANet module are changed into 8 Res units in order to reduce the amount of parameters and suppress the disappearing gradient. Four yolo layers need 12 anchor boxes, and the anchor boxes calculated through the clustering algorithms are shown in Table 2.

3. Implementation and Analysis
This section focuses on the implementation of the improved YOLOv4 for detecting power components, detection results, and the analysis on the improvements. DF, SI, and RI represent drop fuse, suspension insulator, and rod insulator in the following tables and figures.
3.1. Implementations of the Improved YOLOv4 for Detecting Power Components
The above algorithms run under the following computer configuration: Ubuntu18.04 operating system, Intel Core i5-8400 CPU (2.8 GHz), RTX2070s (8G) graphics card, NVIDIA 450 driver, CUDA 11.0, and CUDNN v8.0.5. Before training the model, we set a number of epochs to 400, batch size to 8, and input image size to . The optimization method of the loss function was the Adam algorithm. We chose anchor boxes obtained through the FCM algorithm, K-means algorithm, K-means++ algorithm, and deep clustering methods to train the original YOLOv4 model and the improved YOLOv4 model. During the training process, every epoch of training costs around 2 minutes. The evaluation of the models was precision, recall, average precision (AP) of each kind of power component, and mean average precision (mAP). During the training process of improved YOLOv4 model, the evaluation of the improved YOLOv4 at different initial training rates is shown in Table 3, and the training loss curves are shown in Figure 10. As shown in Figure 10, the speed of loss curve converging tended to be slow, and the loss value tended to be lower when the initial learning rate became smaller. According to the evaluation of the models, the improved YOLOv4 reached the best precision, recall, and mAP when the learning rate was 0.0015. Therefore, 0.0015 is the best initial learning rate for training the improved YOLOv4 to detect the power components.

As shown in Table 4, the original YOLOv4 model that adopted anchor boxes calculated through the FCM algorithm achieved 94.2% mAP with 85.6% precision. The original model that adopted anchor boxes from the deep clustering methods got higher precision, and the model that adopted anchor boxes from SDCN achieved 94.2% mAP. Compared with the original models, the mAP values of the improved models increased except for the model with the anchor boxes calculated through the SDCN. The improved YOLOv4 model improved the precision, recall, and mAP by 2.4%, 2.6%, and 2.8%, respectively, compared with the original YOLOv4 model, and their anchor boxes were calculated through the FCM algorithm. The AP value of the improved model for suspension insulators increased by 6.4%, 6.3%, and 4.1% compared with AP values of other models for suspension insulators. The above findings suggest that the improved YOLOv4 with an extended PANet module and a yolo layer with a larger scale can detect smaller power components more accurately than the original YOLOv4. In summary, the FCM algorithm and the improved YOLOv4 structure improve detection accuracy significantly.
3.2. Detection Results
The live working robots in the distribution network have to work under the conditions of bad daylight such as cloudy weather. Figure 11 shows the results of the detection of brightness-changed images from the detector based on the improved YOLOv4. Referring to Figure 11(a), the detector still maintained good detection results when the brightness was over 0.6. When the brightness got low, only one rod insulator in the bottom of the image was failed to be detected. All of the drop fuses and insulators were successfully detected in Figure 11(b). The mAP under the different brightness is shown in Figure 12. Although the detection precision slightly decreased as the brightness decreased, the mAP was still over 0.95. The detection results indicate that changing the brightness of images in the training set can enhance the robustness of the detector to daylight changes.

(a)

(b)

In the complex environment, many power components are not displayed completely when they are cut off and occluded, which are hard for the detector to detect. In the training set, randomly cropping images to increase the number of incompletely displayed power components were taken to solve this problem. In this experiment, the image was cut off by 25%, 50%, and 75% from top to bottom and from bottom to top of the power components. Figure 13 shows that when the cut-off degree was less than 50%, the detector can identify and locate the drop fuses accurately. When the cut-off degree was 75%, drop fuses that cut off the lower parts were detected with over 90% recognition confidence while those that cut-off upper parts were failed to detect. Therefore, the detector detected the power components accurately as the cut-off degree was less than 50% through randomly cropping images, while the accuracy of detection was not high enough when the cut-off degree was more than 50%.

Cross-arms were artificially placed onto the upper, middle, and lower parts of the rod insulators in the images for analyzing the influence of the occlusion degree on the detection results. The test results are shown in Figure 14. According to the detection results, when the occlusion was less than 33.3%, the detection of rod insulators was correct. When the occlusion degree was 50.0%, rod insulators with occlusions on the upper and lower parts can be detected correctly, while missed detection of rod insulators with occlusions on the middle parts occurred. More missed detection occurred when the occlusion degree was 66.7%. The above findings indicate that the improved YOLOv4 can correctly detect power components with light occlusions.

The detection results of the improved YOLOv4 in snow weather and foggy weather are shown in Figure 15. Referring to Figure 15(a), the improved YOLOv4 was able to detect most of the power components and missed some small power components such as suspension insulators when the snow effect was enhanced. Figure 15(b) shows that detection results of small power components became worse when the fog effect was enhanced. The above findings suggest that the improved YOLOv4 maintains accurate detection in snowy weather and lacks ability to detect the small power components in heavy foggy weather.

(a)

(b)
4. Compared with Other Object Detection Algorithms
One- and two-stage object detection algorithms are wildly used in the field of detecting power components. In this experiment, one-stage object detection algorithms include YOLO series and SSD, and two-stage object detection algorithms are faster R-CNN, cascade R-CNN, and double-head R-CNN. This section presents the comparison with these algorithms to show the accuracy and high-speed detection of the improved YOLOv4. The comparison focused on the training process and detection results.
4.1. Comparison of Training Process
The algorithms were trained under the same computer configuration where the improved YOLOv4 was trained. The number of epochs was 400; batch size was 3; and the learn rate was 0.001. The time cost (TC) per epoch of each algorithm during the training process is shown in Table 4. The improved YOLOv4, YOLOv4, and YOLOv5 cost around 2 minutes for every epoch, much shorter than the cost time of YOLOv3, SSD, and the two-stage algorithms. The training time cost of every epoch shows that the improved YOLOv4 had the highest utilization of computing resources, and the YOLOv3, SSD, and the two-stage algorithms need a lot of computational costs. Therefore, the improved YOLOv4 is suitable to train on one low-profile GPU at an affordable price.
Figure 16 displays the change of training loss during the training process. Although the faster R-CNN model reached the least training loss with the fewest number of epochs, the training loss curve presented a divergent trend after 130 epochs, and the model might not be stable. The rest of the algorithms could reach the basic convergence during the training process. Figure 16(a) shows that the train loss of the improved YOLOv4 was lower than that of YOLOv4, making the detection of the improved YOLOv4 more accurate. Combining the time cost per epoch, the improved YOLOv4 reached the basic convergence faster than the YOLOv3, SSD, and the two-stage algorithms. The training process suggests that the improved YOLOv4 spends much less time getting the basic convergence. The Faster R-CNN gets the basic convergence costing the least time with early stop to avoid overfitting.

(a)

(b)
4.2. Comparison of Detection Results
The live working robots require that the object detection algorithms have fast detection speed with high precision to realize real-time and accurate detection of power components.
This experiment compared the model size, mAP, and the average detection time (ADT) of these object detection algorithms, which were calculated on the test set. Table 5 shows that the improved YOLOv4 got the best mAP and the mAP increased by 4.4% compared with YOLOv5, the latest version of YOLO. In contrast, the faster R-CNN got the worst mAP with the heaviest model. The average detection time of the improved YOLOv4 was 0.012s, shorter than that of YOLOv5 but a little longer than that of the YOLOv4. Although cascade R-CNN and double-head R-CNN got high mAP, the detection time was 6 times and 9 times as long as that of the improved YOLOv4. The average detection time indicates that the improved YOLOv4 may realize real-time detection compared with these algorithms. In terms of the model size, the model size of the improved YOLOv4 was smaller than the two-stage algorithms: YOLOv4 and YOLOv3. Referring to Figure 17, the improved YOLOv4 recognized and located the power components more accurately than other algorithms. Figure 17(a) shows that the improved YOLOv4 detected the power components accurately, especially the small objects on the left side of the image. The YOLOv4 failed to detect two suspension insulators on the left side of Figure 17(b). According to the rest of the figures, the other algorithms failed to detect small power components and the locations of objects were not accurate. As shown in Figures 17(d) and 17(f), the background had an impact on small power components. Parts of the cross arms were detected as rod insulators among small power components detection. That is because small rod insulators occupied fewer pixels, leading to the features of the rod insulators being indistinctive and similar to the features of cross arms. The improved YOLOv4 and the YOLOv4 avoided these wrong detection results because of larger detection layers and fusing more feature maps from shallow layers. These findings suggest that the improved YOLOv4 has a very high detection speed with higher precision and a lighter model compared with the other wildly used object detection algorithms. The improved YOLOv4 is suitable for the live working robots to realize real-time detection of power components.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)
5. Conclusion
In this paper, the improved YOLOv4 has been proposed based on the YOLOv4 and realized real-time detection of drop fuses, suspension insulators, and rod insulators. Changing brightness of the images, adding snow and fog effects to images, and random cropping are used to expand data set and enhance the robustness of the detector to daylight changes, foggy or snowy weather, and incompletely displayed power components. The FCM algorithm allows the anchor boxes to provide more accurate prior knowledge. Adding another yolo layer with a larger scale enables the improved YOLOv4 to detect smaller power components. Combining suitable anchor boxes and improved network structure, the mAP of improved YOLOv4 increases by 3.7%, and the average detection time reaches 0.012 s. The size of the improved YOLOv4 model decreases by 17 M, which is quite light for limited hardware devices or mobile devices. Therefore, the improved YOLOv4 is suitable for the live working robots to obtain the position of power components in order to real-time update motion planning of robotic arm for completing the live working. The combination of the improved YOLOv4 and live working robots in the distribution network has certain significance for the maintenance of the power grid system and the protection of the life safety of power practitioners.
Future work of this research will be picking out the power components in the detection results through semantic segmentation algorithm. Then we will combine the detection results obtained from the improved YOLOv4, reconstructed 3D scene, and coordinate system transformation to obtain the 3D coordinates of power components in the robot base coordinate system.
Data Availability
The data used to support the findings of this paper are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the National Key Research and Development Program of China (2018YFB1307400).