Abstract
The performance of photovoltaic modules (PVMs) degrades due to the occurrence of various faults such as discoloration, snail trail, burn marks, delamination, and glass breakage. This degradation in power output has created a concern to improve PVM performance. Automatic inspection and condition monitoring of PVM components can handle performance-related issues, especially for installed capacity where no trained personnel are available at the location. This paper describes a deep learning-based technique involving convolutional neural networks (CNNs) to extract features from aerial images obtained from unmanned aerial vehicles (UAVs) and classify various types of fault occurrences using cloud computing and Internet of things (IoT). The algorithm used demonstrates a binary classification with high accuracy by comparing individual faults with good condition. Efficient and effective fault detection can be observed from the results obtained.
1. Introduction
Demand for clean energy was created across the globe due to the advancements in technology, energy demand for overgrowing population, and elevated pollution levels due to fossil fuel usage. Consequently, among various renewable energy sources available, solar energy is considered as the prime option to handle the challenges the world encounters. In recent times, power generation using photovoltaics (PVs) has grabbed huge attention due to the versatile application and socioeconomic benefits. The process of converting solar energy into electricity is carried out with the aid of photovoltaic modules (PVMs). The International Energy Agency in their annual reports states that the annual global PV installations have seen a marginal growth of 45% from 36% with a total capacity of 770 GW by the end of 2022. The growing PV market demands uninterrupted power supply, thereby necessitating efficient PVM operation. In general, PVMs are operated outdoors and in harsh climatic conditions that influence the occurrence of faults in PVM. PVM faults can deteriorate the operational life span and reliability of the modules. Furthermore, to preserve the operational life span and reliability of PVM, timely and adequate monitoring of PVM is necessary [1]. PVM faults occur as a consequence of thermal stresses, physical damage, moisture interference, short circuits, soiling, corrosion, and partial shading. The presence of such fault results in the rise of a scenario termed as potential induced degradation (PID) that hinders the performance and lifespan of PVM [2]. Furthermore, to preserve the performance of PVM and ensure prolonged operation, early detection and continuous monitoring are essential. Conventionally, fault diagnosis was performed through visual inspections by skilled professionals. Recently, numerous PVM fault diagnosis techniques were adopted, namely, outdoor thermography, photoluminescence, electrical measurements, fluorescence imaging, and electroluminescence imaging [3]. However, such inspections demand higher time consumption, fatigue prone, more capital cost, large manpower, and nonapplicability over large farms. The setbacks mentioned previously paved a way for adopting novel fault diagnosis techniques that are less time consuming, feasible, efficient, and accurate.
Scientific community and investors have shifted their focus towards the application of unmanned aerial vehicles (UAVs) for diagnosing faults in PVM. The prime reasons for the selection are due to the several factors such as minimized human interference, low time consumption, and nondestructive nature [4]. The usage of UAVs in diagnosing PVM faults has been showcased in several literature studies. Digital cameras along with various onboard sensors affixed to a UAV can be utilized to acquire PVM images. The acquired image data are stored in the local computer and transferred to web server using Internet. For further processing and classification, the images stored in the web server are fed as input to the CNN architecture located in the cloud [5]. Postcompletion of processing the obtained results can be retrieved from users around the globe. Figure 1 represents the process of the online fault detection system using cloud computing technology.

Hotspot localization in a PVM was identified with the aid of thermal imaging cameras accompanied with digital cameras [6]. The major drawback confined with thermal imaging cameras is their constraint towards hotspot detection. Alternatively, various efforts were made to apply UAV in diagnosing visual faults that occur in a PVM. An image mosaicing strategy was adopted by the authors in [7] to diagnose multiple PVM faults from UAV images acquired. In another study, algorithms based on pattern recognition were adopted to distinguish between two visual PVM faults such as snail trail and dust shading [8]. Image quality and resolution can influence the significance of the image features extracted. Overall, to enhance multiple fault diagnosis efficiency and cut-off capital loss, an accurate and advanced technique to diagnose PVM faults is vital [9]. The convolutional neural networks (CNNs) are currently an emerging area of interest among researchers due to the versatile application areas, namely, speech recognition, picture classification, and text classification. The wide range of CNN application is feasible due to their self-learning capability and robust compatibility [10]. Selvaraj et al. adopted fine-tuned CNN pretrained models to classify faults in PVM using thermal images. AlexNet, GoogleNet, and SqueezeNet were used in the study, among which SqueezeNet performed well over other networks [11]. In another study, a chaotic extension neural network was used to diagnose faults in PVM using various operating parameters such as maximum power point tracking, current, voltage, and temperature [12]. Furthermore, the performance of different solar cells was investigated by Gaur and Tiwari to determine the best performing PVM [13]. The supervised learning methodology based on CNN could therefore be chosen to discriminate among different faults from the acquired UAV images. A novel method in fault detection and classification using the deep learning technique from the acquired aerial images of PVM is presented in the paper. Additionally, equipping cloud computing with the proposed method helps in enhancing the performance of the system. Cloud computing has been widely adopted in present world scenarios due to the following reasons: (i) efficiency in cost, (ii) higher speeds, (iii) ease of access, (iv) excellent data backup and restoration, (v) elimination of robust infrastructure, (vi) automated resource management, (vii) strategic edge, (viii) reliability, mobility and lack of hardware, and (ix) unlimited storage. Thus, cloud computing can be a good alternative for startups and investors. The following technical contributions are made in this work:(1)A CNN-based technique was utilized to classify the fault conditions present in a PVM using images acquired from UAV(2)A binary classification of images by comparing individual faults with good condition is demonstrated(3)The cloud computing strategy was adopted to further increase the performance and reduce computational complexity involved in the proposed technique
The classification accuracy shows that the CNN-based technique is accurate and efficient in classifying between fault and good conditions. The paper is constructed with the following sections: Section 2 defines the UAV-based monitoring platform and its specifications; Section 3 describes the basic working of CNN; Section 4 consists of the CNN architecture used in fault detection; Section 5 presents the results of binary classification of various faults with no fault images; and finally, the conclusion of the work is discussed in Section 6.
2. UAV-Based Monitoring Platform and Specification
UAVs are widely in use in recent times due to their prolonged operations and enhanced accessibility in remote areas. A number of fields apply UAVs including logistics, surveillance, inspections, and photography. The nondestructive nature and limited time consumption during operations have promoted the usage of UAVs in solar farms’ inspection [14]. The general process of a UAV-based monitoring system combined with the PVM fault detection technique is demonstrated in Figure 2. The aerial images of the PVM are acquired using a digital camera coupled with UAV and is transmitted to the ground unit via wireless communication network. These transmitted images are stored in a data storage system which is further used as the input for the condition monitoring technique using CNN. The available CNN technique extracts the features of the defects in PVM and classifies the defects accordingly. In this work, a light weight UAV (DJI Mavic 2 Zoom) is applied for inspecting PVM. Table 1 provides the complete specification and adopted factors of UAV. The complete processing of aerial images is carried out in the cloud platform of Google Collaboratory equipped with python 3.7 notebook enabled with TensorFlow. Additionally, the aerial images were stored into the cloud server of google drive for further processing. The results obtained can be verified by any user with the grant access across the globe.

Due to the prolonged outdoor operations under continuously changing environmental conditions, photovoltaic modules are susceptible to different faults that can degrade the operational life and output performance [15]. The most common type of visual faults in a PVM includes discoloration, snail trail, glass breakage, delamination, and burn marks. They are illustrated in Figure 3. The loss of adhesion between the layers of PVM can cause delamination [16]. The accelerated rate of delamination will pave a way for moisture penetration resulting in formation of oxides. Continuous accumulation of oxides leads to burn marks which will destruct the internal parts of PVM [17]. Yellowing or browning of cells in PVM represents discoloration that can hinder the supply of output power [18]. The physical damages induced during transportation and installation can lead to glass breakage. Additionally, due to varying climatic conditions, hail storms and enhanced outdoor operations can induce thermal stresses on modules resulting in glass breakage. This can lead to the formation of hotspots which affects the efficiency of PVM [19]. Microcracks in photovoltaic cells represent snail trail that is found in panels operating for more than a year [20, 21]. All the listed faults have a direct impact on the performance and reliability of a PVM. In order to ensure a long-term operation and consistent performance of a PVM, an accurate and timely detection of faults is necessary [22].

(a)

(b)

(c)

(d)

(e)
3. Outline of Convolutional Neural Networks (CNNs)
CNN is a controlled learning technique based on deep learning algorithms. CNN is capable of capturing an input image, assigning significance (weights and biases) to different features in an image, and being able to distinguish one from another. Three-layer groups constitute to form CNN which are as follows: fully connected, pooling, and convolution layers. A simple CNN structure is suggested in Figure 4.

Any CNN structure that involved the aforementioned layer groups that work on the principle is described as follows:(1)The input image is the first layer of CNN that holds the pixel values of the image.(2)The convolution layer will compute the output of neurons connected to the input layer by calculating the scalar product of the weights and volume of the input region. The rectified linear unit (ReLU) attempts to add an activation function such as sigmoid to the activation output provided by the preceding layer.(3)The spatial dimensionality of the image input postconvolution is downsized by the pooling layer, further decreasing the volume of parameters under the activation function.(4)The fully connected layers will generate scores for various classes from the activations to be used for multiple classifications. ReLU can be applied to boost performance between the layers.
The class scores for classification and correlation coefficient for regression are formulated through CNN by transforming the data provided as input across several layers of convolution and pooling. Hence, one cannot determine the overall architecture of CNN through randomness. Every CNN model requires careful amount of time to train and assign proper hyperparameters to deliver enhanced performance. A brief description of various layers that constitute CNN is provided as follows.
3.1. Convolution Layer
The convolution layer exhibits a key role in the way CNN works. The parameters of the layers focus on the usage of learnable kernels. Such kernels accumulate less number of dimensions over space, nevertheless, spread among the input range. When the data reach a convolution layer, each filter is translated over the input’s spatial dimensionality to create a 2D activation map. When one floats through the data, the scalar product for every value is calculated in that kernel. From this, the network will learn kernels that trigger when they see a particular feature at a given spatial input location (Figure 5). They are usually referred to as activations. The center element of the kernel is positioned over the input vector, from which a weighted sum of itself and any neighboring pixels are then computed and replaced.

Convolution layers may also reduce the model complexity considerably by optimizing its performance. Three hyper parameters, namely, zero-padding (adding zeros around the border of input image), stride (movement of filter in one direction), and depth (no. of filters) will optimize the performance of convolution layers.
3.2. Pooling Layer
Pooling layers help in reducing the dimensional representation of any input data, thereby shrinking the computational complexity and volume of parameters involved. The pooling layer works over each activated input map and uses the “MAX” function to scale the dimensions of the convolved image. Since pooling layers are naturally destructive, only two commonly known forms of max pooling are available. Both the stride and filters in the max pooling layer are fixed as 2 × 2 allowing the layer to expand the input’s spatial dimensionality throughout.
3.3. Fully Connected Layer
Fully connected layers form the ultimate layer of the CNN network. The input to the fully connected layer will be the output provided from the preceding convolution or pooling layer. The activations from the utmost convolution or pooling layer must be flattened prior being fed into the fully connected layer. The final layer uses an activation function like sigmoid or softmax function to classify the probability of identifying a particular class for a given input image [23].
4. Architecture of CNN-Based Solution for Fault Detection
This section describes a simple CNN structure to diagnose PVM faults. In this method, CNN extracts aerial image features and performs binary classification between fault and no-fault conditions in a PVM. CNN being a feed forward network helps in preserving the image spatial correlation and thereby capturing the characteristics of an image. Initially, in CNN, an interchanging convolution and subsampling operations are performed followed by adopting a multilayer network. The output of the CNN is flattened for the fully connected layer, and sigmoid is used for binary classification of the PVM. The key reason for using the sigmoid function is that it occurs between 0 and 1. It is therefore specifically used for binary classification models where the probability must be predicted as output. Since there is only the possibility of something among the 0 and 1 scales, sigmoid is the right option [10]. Figure 6 represents the CNN architecture proposed in this work that contains 2 learned layers (one convolution and one fully connected layer).

In the proposed architecture, each aerial image is reshaped to a size of (240, 240, 3) and fed as input to the CNN structure. The reshaped image is passed into a (2, 2) zero padding layer which surrounds the image borders with zeros. This image is sent to the convolution layer for feature extraction with 32 filters of stride 1 and size (7, 7) [1]. The output of this convolution layer is connected to a batch normalization layer and an activation layer; namely, rectified linear units (ReLU) are used [24]. Two max pooling layers are provided along the architecture to minimize network complexity. A three-dimensional matrix will be derived and is converted into a one-dimensional vector with the help of the flatten layer. Finally, a dense fully connected layer with one neuron is drawn as the output that has a sigmoid activation, commonly used for binary classification. The adopted parameters of the CNN architecture are tabulated in Table 2.
5. Experimental Results and Analysis
In the following section, the experimental assessment of the proposed CNN-based method for the binary classification of PVM faults is carried out. The experiments were performed with the obtained aerial images of PVM. The image dataset contains a total of 600 image samples of six different conditions (100 sample images for each condition) involving five different types of defects (discoloration, snail trail, glass breakage, delamination, and burn marks). Each and every defect is individually compared with good conditions, and the results of the binary classification of each defect condition are obtained.
5.1. Experimental Setup
The whole dataset of sample images are split into three subsets, namely, training dataset (70% of dataset), validation dataset (15% of dataset), and testing dataset (15% of dataset). The training and validation datasets are utilized for feature extraction to be learned by the machine, while the test dataset is used to measure the performance of the trained model. In the experiments, each fault condition is provided with an individual fault ID which is compared with the good condition. All the conditions of PVM are listed in the following Table 3. An attempt to exhibit the precise performance of the proposed CNN model with a minimum input dataset is exhibited in the present work. The overall experimentation was performed in a cloud platform of Google Collaboratory environment with TensorFlow background.
The abovementioned dataset is made uniform such that the results obtained would be unbiased. In case of a randomized size of the dataset, there is a probability that the end results would be biased towards the class that contains more amounts of data. Usage of a large amount of data will help improve the learning rate of the model. Acquiring images of PVM that have same make and power output will enhance accuracy of the model. The abovementioned consideration must be followed such that errors in feature extraction will be eliminated.
5.2. Training and Validation of the Model
The proposed CNN-based architecture is trained for a minimum of 20 cycles which acquires about 5 minutes exposing minimum computation time. Both the training and validation accuracy reaches a saturation value after 15 epochs, and the number of cycles is limited to 20. The validation dataset is used to do an initial assessment on every CNN model before its implementation in detecting fault conditions in PVM. The proposed model training is depicted in Figure 7, based on overall accuracy with respect to the number of epochs during training. The graph shows that the proposed CNN method produces a minimum amount of error that displays the excellent computing capability of the proposed network architecture. From Figure 7, it is evident that with the increase in the number of epochs, the overall accuracy of the fault detection model also improves. Overall fault detection accuracy can reach up to 98.7% after the completion of the training process.

The model also exhibits lower computational complexity and higher accuracy when working with minimal dataset size. The proposed CNN method is assessed for various fault occurrences in a PVM, and test results are obtained by using the confusion matrix. A binary classification is carried out involving five different fault images compared with good condition images. The confusion matrix describes the classification accuracy and errors due to misclassification as shown in Figure 8.

(a)

(b)

(c)

(d)

(e)
Hence, it can be found that the features extracted in these two faults (delamination and snail trails) are not sufficient and require much more features for improved working performance. Following the training method, the performance of the developed CNN model is evaluated further by utilizing 90 image samples (i.e., 15 image samples in every PVM condition). Table 4 exhibits the performance of the model for the detection of PVM conditions.
The results in Table 4 display high accuracy (above 95%) for detecting each fault condition with respect to good condition. The overall mean accuracy of the proposed model is calculated to be 98.66%. Lower computational time and simple structure are the major advantages of the proposed model.
5.3. Analysis of Model Performance
The overall performance of the developed CNN model is analyzed based on diverse facets, e.g., comparing the model performance with different dataset sizes and evaluating the developed model performance with other existing pretrained models through a cycle of experiments.
Case 1. Comparison of performance with different dataset sizes
In this case, the performance of the proposed model is compared for different sizes of the dataset varying between 120 sample images and 600 sample images. The performance results are tabulated in Table 5. Based on the derived results, it can be seen that there is a variation in the trend of accuracy for change in the size of the dataset. The overall performance of the model improved with an increase in the size of the dataset. Precisely, once the size of the dataset reaches 480 sample images (i.e., 80 sample images per PVM condition), a significant rise in the accuracy of the proposed model is observed. Also, the accuracy of the model can be improved by expanding the amount of collected aerial images.
Machine learning-based techniques require a large amount of dataset to train the machine. The amount of data directly affects the performance of the particular technique. In this experiment, an attempt is made to produce maximum accuracy with a minimum number of image samples. Table 5 displays model accuracy with different sizes of the dataset. From the results, one can say that the performance of the model increases with increase in the size of the dataset. As the size of the dataset improves, the learning rate of the model multiplies, establishing better accuracies. Likewise, the accuracy of the model increases gradually from 88.67% and reaches 98.66% with the increase in size of the dataset from 120 images to 600 images. The overall model accuracy reaches above 95% with image sample above 480 sample images.
Case 2. Comparison of performance with existing pretrained models
Transfer learning is a well-established method in machine learning which focuses on knowledge transfer. In precise, the knowledge gained while solving one problem is stored and then applied to solve a different problem under the same domain. In this experiment, the transfer learning approach is used to compare the performance of the proposed CNN model with available pretrained models. The present study utilizes well-established deep learning models such as VGG-16 and ResNet-50 for comparison purpose [25, 26]. The abovementioned models have delivered exceptional results in the image classification problems. Table 6 presents the results compared for identifying various PVM faults based on the image dataset acquired. It is depicted that the VGG-16 model exhibits better performance in detecting faults such as discoloration, delamination, and glass breakage with above 80% accuracy.
In certain cases, the model accuracy declines gradually as it is evident for faults such as snail trail and burn marks which produced below 60% accuracy. For the ResNet-50 model, the analysis and detection of burn marks and snail trails are higher compared to VGG-16. However, the performance is not accurate and reliable for the complete PVM faults to meet inspection standards. The results displayed in Table 6 confirm that the proposed solution outperforms in all evaluated cases for fault diagnosis scenarios with high accuracy (98.66%). The adopted pretrained models resulted in poor classification accuracy representing the enhanced complexity and overfitting problem for the specific dataset. Models with such high complexity require top-end systems with utmost specifications to reduce complexity in computation. On the whole, the proposed CNN model delivers quick solutions with minimal computational time representing reduced complexity. A simple fault detection model that can be run in low-end systems with high accuracy is demonstrated.
Case 3. Comparison with state-of-the-art techniques
Two class binary classification is carried out in the present work over multiclass classification. The reasons for opting the binary class are listed as follows: (i) binary class problems produce faster classification with instantaneous results, (ii) minimal confusion among image patterns, (iii) optimal for real time application, and (iv) inherits low computational power with minimal hardware requirements. The performance of the proposed technique is evaluated with various state-of-the-art techniques used in the literature. The proposed technique proved to exhibit more accurate classification results than other techniques. Table 7 presents the comparison among the techniques adopted in the literature and the proposed technique.
From Table 7, the observation states that the proposed model displays more accurate results than other state-of- the-art techniques. Additionally, the overall computational time for training the model was found to be 140 seconds in a minimal hardware system that lacks graphical card with 8 GB RAM. Also, using such simple convolutional models can aid in real-time application.
6. Conclusion
This paper presented a simple fault detection CNN-based model for identifying the operating condition of PVM with the help of aerial images obtained from UAV using cloud computing technology and Internet of things. The proposed solution is assessed extensively for its performance in fault detection, and the results are analyzed based on the comparative study against existing solutions. Typical fault conditions including discoloration, snail trail, glass breakage, delamination, and burn marks are compared with good conditions. A binary classification is performed against each of the fault conditions, and the numerical results clearly confirm that all the faults are identified with high accuracy. The abovementioned fault detection techniques can be performed on a real-time basis with a perfectly trained model. Furthermore, integration of the proposed method in UAV platforms can help in inspection of large PV farms. Automated inspection with UAVs can minimize human interference, eliminate manual errors, and reduce time consumption. Apart from the pros, certain challenges exist in the present work that are provided as follows: (i) acquisition of data is challenging, (ii) the method provides insight on the occurrence of faults but not the type, and (iii) narrow application. With respect to future work, several future directions are suggested. Instantaneous results can be acquired by implementing the proposed model onto the onboard diagnostic system in UAV. Further assessment of the model through broad field evaluation can deliver enhanced results. Apart from binary classification, a multiclass classification of the abovementioned faults can become a reliable solution in detecting multiple faults at a time. The performance of the model can be improved with the aid of hybrid validation techniques.
Data Availability
The data used to support the findings of this study are included within the article. The data are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.