Abstract
There are a large number of insulators on the transmission line, and insulator damage will have a major impact on power supply security. Image-based segmentation of the insulators in the power transmission lines is a premise and also a critical task for power line inspection. In this paper, a modified conditional generative adversarial network for insulator pixel-level segmentation is proposed. The generator is reconstructed by encoder-decoder layers with asymmetric convolution kernel which can simplify the network complexity and extract more kinds of feature information. The discriminator is composed of a fully convolutional network based on patchGAN and learns the loss to train the generator. It is verified in experiments that the proposed method has better performances on mIoU and computational efficiency than Pix2pix, SegNet, and other state-of-the-art networks.
1. Introduction
Insulators are widely used in the power transmission system. Once cracked, there would be great failure of power grid system, causing significant economic loss and social chaos [1]. Therefore, it is very necessary to detect the insulators for power line inspection. With the continuous improvement of robotics and image processing technologies, the manual inspection is being replaced by inspection robots or UAVs capable of autonomous inspection, mounted cameras as the sensors for environment perception or defect detection. However, it is very difficult to extract and identify the insulator components from the insulator images, because the insulators have different colour textures, resolution, and spectrum, also with various positions and postures [2]. In addition, the images are always with cluttered background, which makes the insulators difficult to be recognized [3]. Besides, the insulator images may be blurred due to jitter during the movement of the inspection robot [4].
Segmentation of insulators in the aerial images has been a basic problem of insulator inspection. Various researches have focused on this area. Traditional methods usually leverage various features for insulator inspection. Zhao et al. [5] adopt a localization approach of insulators based on shape points and equidistant model. They use the orientation angle detection and the binary shape prior knowledge to detect different kinds of insulators. The method of [6] benefits from the saliency and adaptive morphology, which fuses the colour and gradient features to detect the insulators. But this method cannot be applied to locate various insulators with inconspicuous colour. Zhai et al. [7] present bunch-drop fault detection to determine the coordinates of insulators, but this method can only be used for glass and ceramic insulators. In [8], the multiscale and multifeature descriptor is proposed to represent the local features. They obtain spatial order features from the local features, then the region of insulators is determined using spatial order features. These methods have similar disadvantage. They present undesirable results when the insulator is very close to the background environment or the background is complex.
Compared with traditional methods, machine learning approaches are robust and accurate for target detection. Shang et al. [9] locate the insulators’ position based on the maximum between cluster variance and the Adaboost classifier. But this method requires independence between the insulators. The studies in [10] extract the features based on Local Directional Pattern (LDP). A classification model based on Support Vector Machine (SVM) is integrated into sliding window framework for locating insulators. In [11], Binary Robust Invariant Scalable Keypoints (BRISK) and Vector of Locally Aggregated Descriptors (VLAD) are adopted to detect the insulators. These mixed features are classified by SVM. But this method is limited to infrared images. Yan et al. extract the histogram of oriented gradients (HOG) and local binary pattern (LBP) and use sliding window method and SVM to realize the insulator detection [12]. These approaches are basically designed for a specific type of insulators, leading to a lack of adaptability.
While moving ahead with deep learning technology, the above algorithms are gradually replaced. Deep learning has achieved very efficient results in various tasks such as detection, recognition, and segmentation. The studies in [13] construct the saliency area detection framework based on generative adversarial network. However, they use synthetic insulator samples in the training processing and real images in the test experiments, which lack sufficient reliability. In [14], the single shot multibox detector (SSD) combined with a strategy of two-stage fine-tuning is adopted for identifying the insulators. But this method is only used for porcelain insulators and composite insulators. Siddiqui et al. propose a rotation normalization and ellipse detection method. The proposed Convolutional Neural Network- (CNN-) based detection framework achieves detecting 17 different types of insulators [15]. In [16], authors improve the anchor generation method and nonmaximum suppression (NMS) in the region proposal network (RPN) of the faster R-CNN model, which enhance the accuracy and efficiency. But these methods cannot realize real-time detection. Arnab et al. propose that high-order consistency occurs in the CNN-based segmentation method [17]. In [18], authors show that semantic segmentation based on GAN can solve the high-order consistency problem.
In summary, current insulator segmentation methods all have some deficiencies. Feature-based traditional methods cannot deal with various types of insulators with different scales or shapes. CNN-based segmentation networks lead to high-order consistency that cannot be used in real-time situation. To address these issues, a more adaptive method needs to be devised. In this paper, we use an end-to-end GAN network to achieve pixel-level insulator segmentation. The trained model can achieve segmenting insulators without manually set parameters. It is verified in experiments that the network can produce high-quality pixel-level segmentation of insulators in real time on embedded devices in the routine inspection.
The contributions of this paper are the following: Firstly, a lightweight end-to-end generator with asymmetric convolution kernel is devised to produce pixel-level segmentation of insulators with the original RGB image as input. Secondly, we explore the patchGAN classifier in the discriminator, presenting a punishing function at the scale of image patches.
The rest of this paper is organized as follows: Section 2 discusses the pipeline of our modified conditional generative adversarial network. Section 3 presents the dataset establishment. The experimental evaluation and discussions are proposed on Section 4, and we conclude this paper in Section 5.
2. Modified Conditional Generative Adversarial Network
2.1. Modified Model
In this section, we introduce the overall description of the proposed network. As shown in Figure 1, the framework is a fully convolutional GAN, which is constituted by two components: a lightweight generator based on encoder-decoder network and a discriminator with classification model based on patchGAN. The generator produces fake segmentation result for a given image. The discriminator takes in both the fake segmentation images and ground truth real images and tries to discriminate real images from fake generated images. During the training process, the generator model is concurrently trained to generate more realistic images, which are hard to discriminate from the ground truth real images.

2.2. Generator
The generator follows the encoder-decoder architecture and the details are listed in Table 1. It is composed of 5 layers of encoding and 5 layers of decoding. Each encoding layer consists of convolutional layer, batch normalization (BN), rectified linear units (ReLU), and max pooling layer. BN is adopted to stabilize training, speed up the convergence, and regularize the model [19]. Max pooling with a window and the stride of 2 is inserted between two encoding layers, which achieves subsampling the feature map by a factor of 2. Furthermore, we store the max pooling indices to capture the image’s boundary information in the encoder feature maps. In particular, we use two asymmetric spatial filters of and instead of , which deepen the network structure and increase the degree of its nonlinearity. In addition, the and filters reduce the number of parameters and yield a more compact generator model, which helps in improving its computational efficiency [20]. The encoder layers predict both low-level and high-level feature maps, which have excellent feature expression capability.
Each decoding layer has a corresponding encoder layer. UpSampling layer is applied to upsample the input feature map utilizing the max pooling indices. As one of the most successful methods in segmentation, the max pooling indices that are stored by the corresponding encoder feature map pass to decoder feature maps, which preserves the boundary details and leads to high segmentation accuracy. BN is inserted between the deconvolution and ReLU. The asymmetric spatial filters are also used to each of these maps. In the absence of asymmetric spatial filters, the entire network parameters have increased by more than 19M, which has a great impact on processing speed.
The generator was built as a lightweight network, but the number of layers is a comprehensive trade-off between time-consuming and segmentation accuracy. The final output of the generator is a segmentation result, which is fed to discriminator model with the input image.
2.3. Discriminator
The discriminator model structure is presented in Table 2. The concatenation of the generated image and ground truth real image is the input of the discriminator. The discriminator model has 5 blocks and consists of convolutional layer, LeakyReLU, and BN. The convolutional filter is , with the stride of 2. BN is added to this model except the first block, which is leveraged to accelerate the network convergence process. LeakyReLU is used to guarantee that neurons will not die when the input is less than 0.
It is well-known that the loss produces blurry results in the generator, which help to force low-frequency correctness [21]. loss can be defined as follows: Hence, the discriminator is motivated to model the high-frequency structure. For this end, the patchGAN is adopted as the discriminator structure. Based on insulator segmentation experiment, we choose patch size instead of in [22], which are verified in the effect in the experimental section. The patchGAN maps from image to a array of outputs , where each signifies whether the patch in the image is real or fake. It is worth noting that we only use the discriminator during the training phase, so the efficiency is not primary in the experiments.
2.4. Objective
The objective function of the network can be defined as follows: where is the weight parameter, is the predicted segmentation image, is the ground truth, and stands for loss.
As the formula shows, it has two parts. First, tries to minimize the accuracy of the discriminator that tries to maximize it. In addition, the generator is trained to achieve both fooling the discriminator and producing more realistic image which is similar to the ground truth in an sense.
3. Establishment of Our Dataset
3.1. Data Collection UAV System
To accomplish this task, a UAV data acquisition system is designed and shown in Figure 2. The data acquisition system is composed of a Pan-Tilt camera of Zenmuse and a DJI M200 UAV platform and an insulator segmentation method to be proposed. The camera captures the images of insulators on the transmission line, including various types like porcelain insulators and composite insulators.

3.2. Datasets and Implementation Details
The insulator datasets are acquired in two ways: the UAV data acquisition system and the Internet. Samples are enhanced by random rotation, mirroring, colour perturbation, and blurring and resized to before training. The datasets consist of 6000 insulator images with more than 6 types, and each image contains 1 to 10 insulators, with an average of 4 insulators per image, adding up to a total of 24,000 insulators. They are divided into a training set of 5000 images, a validation set of 500 images, and a test set of 500 images. It is worth mentioning that a whole strip of connected domains covering the insulator is used as the insulator label, ignoring its edge details because for the insulator identification there is no need to mark the shape. Besides, this labeling method not only reduces network complexity but also improves the processing efficiency.
4. Experiments
In this section, we carry out several experiments to demonstrate and validate the following goals. First, we describe the evaluation metrics used in the experiments. Next, we demonstrate the improvement of segmentation accuracy and efficiency comparing our model with state-of-the-art methods. Then, we conduct some experiments to verify the capacity of our generator. Besides, we compare the segmentation results of different patch sizes in the discriminator. Furthermore, the influence of training set image number is evaluated. Finally, we analyse the segmentation results of insulators in different sizes.
All the networks are implemented based on Keras framework using TensorFlow backend. The network is checked out on NVIDIA Tesla V100 server. During the training, we set batch size of 8, Adam optimizer with , , and learning rate of 0.0001.
4.1. Evaluation Metrics
Mean Intersection over Union (mIoU) is a standard for defining the segmentation accuracy. mIoU evaluates the prediction precision of the segmentation. mIoU can be formulated as where is the number of the dataset classes and is the calculated number of pixels of class predicted to class . is the number of pixels of class predicted to class , and is the number of pixels of class predicted to class .
The average segmentation time of different models is compared in this paper, which is very important for the real-time performance.
4.2. Analysis of Architecture
To verify the superiority of the modified network, we compare our method with Pix2pix [22], SegNet [23], Unet [24], and FCN [25]. FCN uses a fully convolutional network to transform image pixels to pixel categories for semantic segmentation. The segmentation-equipped VGG16 net [26] is adopted as the front structure in this experiment. Figure 3 illustrates the segmentation performance of the five models. Table 3 shows the quantitative comparison results. We can see that Unet performs as good as SegNet, and it has the lowest time consumptions. FCN has a slight increase of mIoU, but it has the most parameters and the longest processing time. Pix2pix performs relatively well due to the adoption of GAN, which is similar to our model. The GAN model can correct the higher order inconsistencies between the generated segmentation image and ground truth real image. Our method is superior to other methods with the highest mIoU, the fewest parameters, and the lowest time consumption. It shows that our model with asymmetric spatial filters and patchGAN boosts the performance.

4.3. Influence of Generator Architecture
To show the time consumption and segmentation accuracy of our model, we compare several generator models. In this experiment, the same discriminator model with patch size is leveraged. We call the model that used the spatial filters as 33 patch16 for convenience. The asymmetric spatial filters and are adopted in our models. In addition, we use the same generator as the Unet network, which we call Unet patch16. The difference between the Unet patch16 and Pix2pix is that they have different patch sizes. The comparison results are shown in Table 4. This experiment demonstrates that our method has a little advantage over mIoU, and the parameters are much less than them. It can be seen that the encoder-decoder architecture with asymmetric spatial filters in the generator plays an important role in it.
4.4. Comparison of Patch Size in the Discriminator
The patch size of our discriminator influences the segmentation performance. Table 5 shows the qualitative results. We can see that patchGAN with a patch size is used in all our experiments. Obviously, means PixelGAN and means GAN. The PixelGAN and GAN obtain results that are not very satisfactory. The patch size performs as good as the patch size, but the patch size has more parameters.
4.5. Influence of the Training Set Image Number on Segmentation Results
To evaluate the influence of training set image number, 1000, 2000, 3000, 4000, and 5000 images are randomly selected to constitute different training datasets. We train the model using these datasets and verify its performance on the same test datasets. Figure 4 shows the mIoU results. The results show that the more training set number, the higher mIoU. But mIoU grows slowly when the training set reaches 3000 or more.

4.6. Analysis of Segmentation Results of Insulators in Different Sizes
To verify the ability of our model about detecting various insulators with different scale in the insulator images, Figure 5 shows the segmentation results. The result demonstrates that although the objects in the background are larger than the insulators, our model can still segment the insulators with high quality. Our model has the ability to realize the segmentation of both the near insulators and the distant insulators during the actual detection process.

4.7. Influence of Noise on Segmentation Results
To simulate the different weather conditions, we add the salt and pepper noise to the insulator images. In this experiment, three kinds of training datasets are designed: all noisy dataset, half noisy and half noise-free dataset, and noise-free dataset, respectively. We train the three models which are called model noise, model half noise, and model no noise for convenience. Then, we verify the segmentation performance on the same test datasets which are images with salt and pepper noise. Table 6 shows the quantitative comparison results. Figure 6 illustrates the performance of segmentation. We can see that the noisy datasets used in the training process boost the segmentation performance. Therefore, the diversity of training datasets has an important impact on the segmentation results.

5. Conclusion
In this paper, we introduce a pixel-level insulator segmentation network with modified conditional generative adversarial network. Asymmetric spatial filters are adopted in the generator to reduce network parameters and improve computing efficiency. In addition, we explore the patchGAN classifier in the discriminator to model the high-frequency structure. The network can produce high-quality segmentation of insulators with high mIoU and less time cost compared with the existing end-to-end segmentation methods. Furthermore, the trainable parameters are restricted, which makes the proposed network applicable to real-time segmentation on embedded devices in the future. Additionally, the approach also can be applied to other detection tasks in power inspection.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This project was supported by the National Key Research and Development Plan (2017YFC0806501) and the National Natural Science Foundation (U1713224).