Abstract

Bearings are critical components in modern manufacturing, yet they are prone to failures in induction machines. Detecting these faults early can reduce repair costs. To achieve efficient and accurate fault detection, we explore vibration-based analysis. Traditional methods rely on manual feature extraction, which is time-consuming. In contrast, our work leverages deep learning, particularly convolutional neural networks, to automatically extract fault features from raw data. We investigate various image sizes (16 × 16, 32 × 32, 64 × 64, 128 × 128, 256 × 256) and their performance in bearing fault diagnosis. Our convolutional neural networks-based approach is compared to traditional methods such as support vector machine, nearest neighbors, and artificial neural networks. Results demonstrate the superior performance of our data-driven fault diagnosis using convolutional neural networks.

1. Introduction

In rotating machinery, rolling bearings are the most widely used equipment in industry. Modern rotatory machine equipments become larger, complex, and more precise with the development of technology, which makes rolling bearings always run under high-speed and heavy-load operating conditions. Bearing failure will result in significant breakdown time, elevated repair cost, and even a potential decrease in productivity [1]. Rolling bearings are one of the important mechanical parts of rotating machinery and are the main cause of basic industrial equipment failure [2]. Therefore, condition monitoring (CM) and fault diagnosis are of utmost importance for the reliability, safety, and industrial manufacturing. In the era of Internet of Things, huge data in real time are collected from bearing health monitoring systems. Mining the features from raw data and effectively and correctly identifying health of machine with newly developed and advanced methods has become a new subject in the machine health monitoring and diagnosis of faults [3]. Signal-based diagnosis includes analysis of time domain [4], analysis of frequency domain [5], and analysis of time-frequency domain [6]. Model based fault diagnosis algorithms are actually developed to measure the uniformity between actual and predicted output. Knowledge-based fault diagnosis includes symbolic intelligence quantitative methods and qualitative methods based on machine learning [7]. Qualitative methods include both unsupervised learning systems like principal component analysis (PCA) and supervised learning systems such as artificial neural network (ANN) [8] and support vector machine (SVM) [9].

In traditional machine learning methods, feature extraction is the most difficult task, relying on techniques of signal processing and diagnostic expertise that are also limited in fault diagnosis capabilities due to shallow architectures [10]. Deep learning, branch of machine learning, on the other hand, can extract the features from raw data and has become a promising tool for fault diagnosis [11]. Deep learning has lot of applications such as, pedestrian reidentification [12] and face alignment [13]. CNN is a deep learning method and has emerged as an effective method for the fault diagnosis. CNNs have different layers, namely, convolution, pooling, and the fully connected final layer. CNNs can effortlessly recognize specific structures and patterns. Methods for feature extraction by continuous wavelet transform and bearing fault diagnosis based on the convolutional neural network and support vector machine [14]. Study in [15] proposed the diagnosis of bearing faults based on short-time Fourier transform and convolutional neural network. A deep learning mechanism reported by [16] in combination of cyclic spectral coherence and CNN to enhance the performance of rolling bearings. The conversion of the vibration signal to grey scale image to establish convolutional neural network model for classification is reported in [17]. A novel method signal-to-image mapping to covert one-dimensional vibration signal into two-dimensional grey scale image and combined with convolutional neural network for the extraction of fault features [18]. Deep convolution neural network (DCNN) has been established to achieve high accuracy under a noisy situation by adopting an end-to-end learning method [19]. CNN applied on the roller bearing and gear box dataset and achieved high accuracy of up to 99% [20]. Deep learning technology is widely used in fault detection and diagnosis of bearing faults. An improved deep convolutional neural network with multiscale information is proposed in [21]. An interesting study with thermal images for the bearing fault diagnosis is also carried out [22]. A novel method for fault diagnosis is proposed, which transforms the vibration signal into a symmetrized dot pattern image in polar coordinates along with the convolutional neural network [23].

In this paper, we propose a fast, simple, and accurate motor fault detection and condition monitoring system using 2-D CNN. CNN-based fault diagnosis approach includes the conversion of a raw vibration signal by direct processing of signal segments into two-dimensional (2-D) grey scale image, which can extract the fault features automatically. Different image sizes are considered, compared, and analyzed with the proper training for the diagnosis of bearing faults in an induction machine. The diagnosis method based on optimum image size will also be compared with the traditional fault detection techniques such as support vector machines, nearest neighbors, and artificial neural networks which includes extraction of time domain features manually.

In summary, the contributions of paper are to obtain the optimum image size with increased number of fault conditions. The research article also presents the design of a simple architecture with a smaller number of filters, which reduces the training time. Further, a comparative study of the developed deep learning method with classic classification methods has been projected in this research work.

The rest of paper is organized as follows: Section 2 provides a brief introduction to motor faults. Motor fault diagnosis dataset outlined in Section 3. The proposed 2-D CNNs with parameters are presented in Section 4. Performance is evaluated in Section 5. Finally, Section 6 concludes paper with future directions.

2. Introduction to Motor Fault

Induction motors are reliable, rugged, low-cost, low-maintenance, suitable-sized, and reasonably efficient. Besides all these advantages, these motors also have undesirable faults. These faults can be due to bearings, rotor, stator, their windings, etc. Induction motors operated extensively in harsh environment which may cause faults in these motors. Insufficient cooling, insufficient lubrication, high vibrations, overloading, and frequent switching can lead induction motor to failure.

According to the Electric power research institute (EPRI), 41% of faults are due to bearings, 37% due to stator faults including stator windings caused by mechanical stresses, electrical stresses, thermal stresses and environmental stresses, 13% due to rotor faults including rotor mass unbalance, broken rotor faults, faults due to bowed rotors, rotor winding faults, and 10% other faults [24] (see Figure 1).

Bearings are the most common elements used in electrical machines in various industries including textiles, manufacturing, power plants, oil refineries, pumping stations, construction, and renewable energy. These bearings are used to permit rotatory motions in machines. It reduces friction between the machine parts and enhances power and performance to save energy. Bearings consist of two different races named as the inner race and outer race, as shown in the (see Figure 2).

Spalling and pitting are single-point defects normally caused by fatigue and operational wear and are called as localized defects. These defects start from the microlevel and increases with time. Localized defects mostly cause impulsive vibrations and cause failure. On the other hand, contamination, improper lubrication, misalignment, and corrosion cause distributed defects. These defects spread over the whole area, cause continuous vibrations, and results in motor failure [25].

3. Motor Fault Data Preparation

Vibration data of rolling bearings collected at the Case Western Reserve University bearing data center are used for testing and verifying the proposed method. Data were collected by using the 2hp Reliance electric motor having SKF and NTN bearing. The test rig contains drive end (DE) and fan end (FE) bearings data with faults of 0.007 inches, 0.014 inches, 0.021 inches, and 0.028 inches in diameter with motor loads varied at 0, 1, 2, and 3 hp. Faults seeded using an electro-discharge machine (EDM). All these faults were introduced separately at inner raceways, outer raceways, and into the ball. Data were collected using the accelerometer at a 12 o’clock position of motor housing for both FE and DE bearings. Digital data were collected at 12,000 samples per second and 48,000 samples per second [26].

In this study, DE bearing data for normal, inner race fault, outer race fault, and ball fault conditions are gathered for classification with faults of 0.007 inches, 0.014 inches, 0.021 inches, and 0.028 inches in diameter. Dataset contains one normal and fifteen fault states under a load of 1hp and 1772 RPM motor speed (see Table 1).

The dataset in this research has one normal and fifteen fault conditions. A total of sixteen conditions, as described in Table 1. include a single point drive end defect having four balls, four inner race faults, and seven outer race faults. Outer race fault conditions are relative to the load zone. Data are selected with different fault conditions from a large database. After the selection, data are combined and arranged according to their respective classes in the form of matrix, as shown in Table 1.

3.1. Segmentation

A raw signal is divided into different segments with respect to image size. Segment length is selected on an M × M basis where M is the image size. For a 32 × 32-pixels image, length of each segment will be 1024. Signals are divided into segments of length 4096 to obtain the size of 64 × 64 pixels image, 16384 to obtain the size of 128 × 128 pixels image, and signals are divided into segments of length 65536 to obtain the size of 256 × 256 pixels image.

Different-sized segment lengths are used for different sizes of images (16 × 16, 32 × 32, 64 × 64, 128 × 128, and 256 × 256). The process of segmentation is shown in Figure 3 and number of segments for each signal (see Table 2).

3.2. Signal-to-Image Conversion

From the segments, data of each image is extracted and converted into a 2-D grey scale image. Image conversion is made using the formula given in [27].where , , round, is the rounding function, and 255, is the pixel strength of grey scale image.

3.3. Creation of the Image Data Store

Image data store is created to manage a collection of image files, where each individual image fits in memory. Subfolders in image data store are considered as labels. In this research, four classes are used (normal, inner race, outer race, and ball), which are also the labels for the data. The image data store is then is used as an input to the CNN for fault diagnosis. Figure 4 demonstrates the flowchart of CNN.

4. Proposed 2-D Convolutional Network

4.1. Overview of CNNs

The convolutional neural network (CNN) is inspired from the biological processes. The connectivity pattern of the neurons resembles the organization of our visual cortex. It involves little preprocessing as compared to other image classification algorithms as it uses variations of multilayer perceptions [28]. CNN is popular for training images, and for the first time in 2012, it was proposed by Alex Krizhevsky, won the Image Net competition, and made a record by reducing the classification error rate from 26% to 15% [29]. This was a remarkable achievement and opened new ways for the use of CNN in image classification tasks. With the successes achieved by CNN, several Internet giant companies like Google and Facebook have started using CNN and variants of CNN for AI tasks such as photo tagging, speech recognition, etc. Similarly, Google and Amazon use deep convolutional networks for photo searching and product recommendation, respectively. The use of CNN has given a great boost to technologies such as image processing and natural language processing [30, 31].

4.2. CNN Architecture

The convolutional layer convolves the input with the filter value, containing weights with specific size. In convolving input with the filter, simple element- wise multiplication is required as the filter slides over the input. The output of this convolution is detection of curves, lines, and other useful features from the input images. In CNN, the filter slides over all the values of input to give an output (Equation (2)). In this case, the input image is of the size 32 × 32 (i.e., 1024). The depth of the input should be the same as the depth of the receptive field. If the size of the receptive field is 5 × 5 then the filter size should be of same size, i.e., 5 × 5. The second argument is the number of filters actually neurons, which are connected to the same region of input. 16, 32, and 64 filters are used in convolution layers consecutively. Filters actually are neurons, which are connected to the same region of input. Padding and stride are important parameters in the convolutional layer. A diagrammatic illustration of CNN layers is illustrated in Figure 5.

Padding ensures that the output size is the same as the input. Equation (3) shows the mathematical formula to calculate the padding size. Stride controls how the filter convolves over the input volume. Details about the parameters are given in Table 3.

The batch normalization layer speeds up the training of the network and saves time. It also reduces the sensitivity to network initialization. Normalization layer makes the training an easier optimization problem. Convolution is a linear operation, so almost after every convolution, the ReLU operation takes an operation. ReLU is a pixel wise process that maps all negative pixel values in the feature map to zero. The main objective of ReLU is to introduce nonlinearity in network.

To reduce the computational complexity and feature deduction or down sampling in CNN, the pooling layer is used such as max pooling, which reduces the number of weights from the first layer i.e., convolutional layer. Mostly, it takes a filter size of 2 × 2 with the stride of 2. In max pooling, the larger value in a particular row is selected for the next layer in order to minimize the features. Output of this layer is shown in the following equation:

The fully connected layer is the concrete and finalized layer of CNN. This layer plays an extremely important role as it contains information directly useful for decision-making. The output size in the fully connected layer is equal to the number of target classes. In this study, the output classes are four (normal, inner race outer race, ball). The SoftMax activation function consists of positive numbers that sum to one as an output. These probabilities can be used for the classification purposes by the classification layer.

5. Performance Evaluation

Vibration signals are divided into different segments of different lengths with respect to image size. The number of segments for each image size are shown in Table 2. Different-sized images are obtained from the segmented data. From the large bunch of data images, samples of normal, inner race fault, outer race fault, and ball fault are shown in Figure 6 for each image size. As the faults are different, every image shows a different pattern.

Network training is done five times on each image size and randomly selected results of training process for each image size are also discussed and the optimum image is obtained on the average basis. Data are divided into training data and validation data. 80% and above of the data are used for the training purposes and the remaining for the validation purposes, as discussed in Table 4 for each image size.

Vibration signals are divided into segments of length 256 to obtain a 16 × 16 pixels image. The confusion matrix in Figure 7 shows four different conditions for the output and target class. In this case, the prediction accuracy for the ball is 98%, inner race is 96%, normal is 100%, outer race is 97.3%, and over all accuracy is 97.8%. 2.2% is the percentage of wrongly predicted samples. The image samples for analysis are been shown in Figure 6.

To obtain the size of 32 × 32 pixels image vibration signals are divided into segments with a length of 1024. The confusion matrix in Figure 8 shows four different conditions for the output and target class. In this case, the prediction accuracy for ball is 100%, inner race is 100%, normal is 100%, outer race is 99.0%, the and overall accuracy is 99.7%. 0.3% is the percentage of wrongly predicted samples.

For 64 × 64 pixels image signals are divided into segments of length of 4096. The confusion matrix Figure 9 shows four different conditions with the output and target class. In this case, the prediction accuracy for ball is 100%, inner race is 94.7%, normal is 100%, outer race is 100%, overall accuracy is 98.8%, and the error is 1.2%.

Segments of length 16,384 are created to obtain the size of 128 × 128 pixels image. The confusion matrix Figure 10 shows four different conditions with the output and target class. In this case, the prediction accuracy for ball is 50%, inner race is 66.7%, normal is 100%, outer race is 57.1%, and the overall accuracy is 66.7%. As all the samples are predicted correctly so the error is 33.3%.

Signals are divided into segments of length 65,536 to obtain the size of 256 × 256 pixels image. The average accuracy obtained from 256 × 256 image size is 67.43 which is less than all other image sizes.

Relying on a single training process for optimum image size is not a good idea. The training process is repeated several times and on an average basis, the prediction accuracy is calculated on each image size to obtain the optimum size image for the bearing dataset. The results in Table 5 show that image size 32 × 32 has the highest accuracy (99.756) and is the optimum size.

5.1. Comparison with Other Methods

Self-extraction of features is necessary in the case of traditional methods other than deep learning. Time domain features extracted after the segmentation process are listed below.

5.1.1. Root Mean Square

Root mean square measures the overall level of a discrete signal where N is the number of discrete points which represents signal from each sample points.

5.1.2. Mean

 The mean of a segment indicates the amplitude of the segment. This indicates the first moment of the data considered

5.1.3. Peak Value

 The peak value is the maximum acceleration in the signal amplitude and is measured in the time domain.

5.1.4. Crest Factor

 The crest factor is the ratio of peak acceleration over RMS, and it detects acceleration bursts even if the signal RMS has not changed.

5.1.5. Skewness

  Skewness is the third moment of distribution used to measure the asymmetry of the probability distribution along with its mean

5.1.6. Kurtosis

 Kurtosis is the scaled version of fourth moment and is used to find tailness in probability distribution curve.

5.1.7. Variance

 Variance indirectly measures the data distribution from mean of segment and is the second central moment of distribution.

5.1.8. Standard Deviation

 Standard Deviation is the positive square root of variance to measure the variation of data.

5.1.9. Clearance Factor

 Clearence factor feature is the maximum for healthy bearings and goes on decreasing for defective bearing in rotating machinery.

5.1.10. Impulse Factor

 Impulse factor if specifies as true, the object extracts the impulses and appends the value to feature returned by extract function.

5.1.11. Shape Factor

 Shape factor is RMS divided by the mean of the absolute value and it depends on signal shape and independent on signal dimensions.

Principal component analysis (PCA) a statistical procedure, most widely used technique for the dimension reduction. Ten features are shown in Figure 11 with their individual and cumulative variance. PCA helps in the selection of features from a large number of features. Features are also referred as principle components. As the explained variance of top three components are more than 90%, so the number of components selected for the model training is three in this part of the research.

Selected feature out of all features showing apposite results in separating the fault classes according to their categories. Figure 12 shows the 1st and 2nd principle component with for different classes.

5.1.12. SVM Model Results

Classifier is necessary for the classification purpose and to know the exact classified and misclassified samples of data. SVM model is trained for 16384 observations including eleven predictors and four response classes with cross validation of 5 folds. Class shown are as follows:(i)Class 1 Normal(ii)Class 2 Inner Race(iii)Class 3 Outer Race(iv)Class 4 Ball

Model trained for bearing fault diagnosis with the help of SVM is shown in Figure 13 having accuracy of 98.3%. Figure 14 shows the confusion matrix of SVM classifier. The SVM model predicts all the four classes. Prediction of class 1 (normal) is 100%, class 2 (inner race) is 97%, class 3 (outer race) is 98%, and class 4 (ball) is 99%.

5.1.13. Nearest Neighbor Model Results

Model trained for bearing fault diagnosis with the help of NN is shown in Figure 15 having an accuracy of 97.4%. The NN model is trained for 16,384 observations including eleven predictors and four response classes with cross validation of 5 folds. Figure 16 shows the confusion matrix of NN classifier. NN model predicts all the four classes. Prediction of class 1 (normal) is 100%, class 2 (inner race) is 96%, class 3 (outer race) is 96%, and class 4 (ball) is 98%.

5.1.14. Artificial Neural Network (ANN) Results

From a total of 16384 samples.(i)70% for training (11468 samples)(ii)15% for validation (2458 samples)(iii)15% for testing (2458 samples)

Training data are presented to the network during training. Network is trained on the default settings with 10 neurons in the hidden layer. Validation is used to measure network generalization. Testing data have no effect on training and independently measures the performance of the network. The confusion matrix of training, validation, and test are shown separately in Figure 17. In the last confusion matrix, all are combined to show the overall predictions by the network. Prediction of class 1 is 100%, class 2 is 98%, class 3 is 97%, and class 4 is 99%. The overall prediction accuracy of the model is 98.9%.

At the end comparison is made from all the methods used in this thesis for the detection of faults in bearings. From Figure 18, it is clear that model with deep learning (CNN) has the highest prediction accuracy (99.75%), as compared to ANN (98.9%), SVM (98.3%), and NN (97.4%).

6. Conclusion and Future Work

In summary, the field of condition monitoring and fault diagnosis is crucial for ensuring the reliability and safety of industrial manufacturing processes. This study is focused on fault detection and diagnosis methods for induction motors, with a particular emphasis on a CNN-based approach that was tested under sixteen different conditions. The results demonstrated that, on average, a 32 × 32 pixel image size yielded the highest prediction accuracy of 99.756%, surpassing the performance of SVM (98.3%), NN (97.4%), and ANN (98.9%).

Looking ahead, there are numerous avenues for further exploration. Future research will involve in-depth analysis of specific mechanisms, novel applications, and alternative approaches. Moreover, the network’s capabilities can be enhanced to predict the remaining useful life of bearings. To ensure that the system meets all requirements and specifications while fulfilling its intended purpose, hardware validation of the current network represents a key direction for future development.

Data Availability

The data are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.