Abstract

To ensure the operational reliability of machinery, rolling bearings exposed to complex and poor conditions should be monitored in real-time. Traditional bearing fault diagnosis methods are always dependent on signal analysis and feature extraction, which are complex and time-consuming. Deep learning method exhibits a good ability in extracting the fault feature, while it is limited to noise pollution and insufficient sample data during the training procedure. In this study, a new sparse enhancement neural network based on generalized minimax-concave penalty and convolutional neural network is proposed to capture fault features automatically. To this end, the generalized minimax-concave penalty is first employed to expand the dataset by pollution data denoise and sparse enhancement of the insufficient samples. Second, the amplified dataset is employed to train the fault classification. By employing the datasets of drive end and fan end derived from the Case Western Reserve University (CWRU), a good prediction accuracy can be found in fault diagnosis for rolling bearings.

1. Introduction

With the rapid development of modern industry, rotating machinery has been widely used in rail transit, wind power generation, aerospace, etc. Bearing, one of the key components in mechanical equipment, is always stated at a high-intensity working state [1, 2]. Due to the complex and poor working conditions, the bearing is prone to wear, crack, and even fracture, which may result in a huge economic loss [3, 4]. In order to improve the reliability and economic benefits of the mechanical equipment, it is critical to identify bearing faults in time.

Fault diagnosis technology, which is a benefit to identifying health status, shows a good ability in preventing catastrophic accidents caused as a result of bearing fault. Traditional fault diagnosis techniques, such as wavelet analysis [5], variational modal decomposition [6], and singular value decomposition [7], have been widely used in fault identification which are time-consuming and laborious and have a low recognition rate. With the advent of the era of big data, bearing fault diagnosis algorithms driven by massive data have become a research hotspot in recent years, especially from a perspective transfer of deep learning from image processing to fault identification [8, 9]. The procedure can be summarized in the following two steps: (1) a two-dimensional time-frequency diagram was first constructed according to the acquired one-dimensional vibration signal by employing signal processing methods, such as a wavelet packet [10] or discrete wavelet transform [11], and (2) the processed vibration signal was reconstructed and further divided into the train set and the test set, which were then imported into the deep neural network after normalization. Moreover, bearing fault classification was performed by using a convolutional neural network (CNN) model [12]. Guo et al. [13] obtained time-frequency feature maps from vibration signals by using a continuous wavelet transform scalogram. The feature maps were then used as the model input, which was a LeNet-5 model for bearing fault diagnosis. Xu et al. [14] used the continuous wavelet transform to transform the time-domain vibration signals into two-dimensional grayscale images with rich fault information. On this basis, a CNN model based on LeNet-5 was constructed to automatically diagnose bearing faults in the images. These studies represent a good ability in bearing fault, but restrict to the length of training time and the complex of network structure.

Bearing fault signals represent obvious nonlinear and nonstationary properties due to the mixture interference with various noises. Some scholars have noted that it is hard to directly extract fault features from the acquired original signal and the model training time is too long. In order to improve the diagnosis effectiveness, Che et al. [15] proposed an intelligent diagnosis model based on the stack denoising autoencoder (SDAE) and CNN according to vibration characteristics of the rolling bearing with a strong noise interference, wherein the SDAE was used to process time series data coupled with the multidimensional noisy interference. Qiao et al. [16] used a one-dimensional convolution layer to suppress the background noise of the rotating machine before extracting the multiscale features. In order to eliminate the interference of bearing fault noise, Jiang et al. [17] extracted fault features by using the sensitivity to the impulse of spectrum kurtosis. It should be noted that the constructed one-dimensional CNN network has many complex structural layers and naturally the network parameters increase sharply. However, the designed complex network did not convert into a huge improvement in the computational efficiency.

At present, the convolutional layer in the deep learning method always acts as a filter (denoising) in processing the data sequence signals. In order to fundamentally solve fault signals disturbed by a strong interference or noise, feature extraction based on the sparse representation has been used in mechanical fault diagnosis due to its good ability in extracting fault features [18, 19]. Selesnick [20] designed a generalized minimum concave (GMC) penalty to overcome the underestimation of the amplitude components in traditional algorithms. Meanwhile, the GMC penalty can be effectively used to induce sparsity and to keep the convexity of the cost function, which exhibits a good feature in reducing noise interference during the mechanical fault diagnosis [2123]. Cai et al. [23] proposed a new reweight generalized minimax-concave sparse regulation to extract the repetitive transients according to the characteristic of the fault rotating machinery. However, it is hard for the GMC to automatically identify the weak faults, and a long postprocessing procedure is required.

In this study, a new sparse enhancement neural network combining the GMC penalty and the deep CNN is proposed to identify rolling-bearing faults. The main work of this study is summarized in the next two steps: (1) the GMC algorithm is used to filter and denoise the interference or noise signals in advance. The denoising procedure is equivalent to a convolutional layer, which is a facility to identify the fault feature, as well as to sparsely enhance the insufficient samples to expand the dataset, and (2) to improve diagnosis accuracy and reduce network depth and training time, the deep learning network is redesigned. The remainder of this study is organized as follows: Section 2 is dedicated to the theory scheme of the sparse algorithm (GMC) and CNN. In Section 3, the theoretical scheme of the deep learning model is artfully designed to identify bearing fault diagnosis. In Section 4, the proposed method exhibits a good ability in bearing fault diagnosis by comparing it with other theoretical methods. Conclusions are drawn in Section 5.

2. Theoretical Background

Here, an intelligent diagnosis method for rolling bearing fault is proposed. First, the sparse GMC algorithm is used to denoise the original vibration signal. Second, feature vectors of the sequence signals are extracted by one-dimensional convolutional neural networks (1DCNN) to classify and identify bearing faults.

2.1. The GMC Algorithm

During the operation of rolling bearings, the measured data will be seriously interfered by external noise. According to the transient characteristic components and noise characteristics of the vibration signals , the GMC model can be established as follows:where and indicate the pure data and the noise interference, respectively. The matrix represents the dictionary library. The GMC is one of the nonconvex penalties for the sparse-regularized linear least squares, the value of which maintains the convexity of the cost function to be minimized and promotes stronger sparsity than the L1 norm [24]. The function can be formulated aswhere () is a regularization parameter. is the GMC penalty, which can be defined as

When satisfied , the optimization problem can be rewritten as a saddle-point problem,

Here, the GMC algorithm is used to denoise the acquired sequence signal in advance and it is also used to highlight the sample features. Sample points of the randomly intercepted signal data are equal to 4096 as shown in Figure 1. To demonstrate the sparse enhancement effects of the GMC algorithm, the range of intercepted data overlaps and the spacing are both different. It can be easily discerned that the GMC exhibit a huge contribution to noise reduction and fault feature enhancement.

2.2. Deep Learning Model Framework

At present, 1DCNN is widely used for processing time series data, which mainly contains the convolution, pooling, and full connection layers [25]. The network structure is shown in Figure 2. The convolution operation is first executed on the input sample data by employing the weight sharing of the convolution kernel. Moreover, the corresponding feature vectors are extracted and its mathematical model can be written aswhere and indicate the corresponding eigenvectors of the output and the input sample data, respectively. and are the convolution kernel and the offset term, respectively. is the convolution operation sign.

By reducing the output dimension of the previous layer and by retaining the important information, the pooling layer is introduced to reduce the required GPU occupancy and improve its calculation time. The pooling layer function can be expressed aswhere is the max pooling layer or the average pooling layer. is the pooled region. and are the widths of the corresponding pooled regions. indicates the eigenvalue in the convolution layer. is an overlap between and .

Feature extraction is executed on a series of convolutional layers and pooling layers, and the acquired feature vectors are input into the full connection layer through the straightening process. The formula is expressed as follows:where and , respectively, are the activation function and the weight matrix between adjacent layers. is the offset term.

3. The Proposed Network Framework of the GMC-CNN Model

To accurately extract the fault features of rolling bearing with a high efficiency, a new network framework combining the GMC and the CNN is proposed, known as the GMC-CNN model. This chapter mainly includes how to construct the GMC-CNN model, how to perform data sparse enhancement for bearing fault signals, and how to solve the problem of insufficient samples.

3.1. Design of the Deep Learning Model

The GMC algorithm has exhibited its excellent performance in fault feature extraction [2123]; based on this, a new network framework of the GMC-CNN model is constructed as shown in Figure 3, wherein the GMC is equivalent to a convolutional layer to extract fault features in advance. At the same time, the depth of the neural network and also the training time are greatly reduced. Herein, we use the GMC algorithm to denoise, and then the extracted feature is further imported into the designed CNN for training. Moreover, bearing faults are classified by the Softmax classifier. In order to easily distinguish the fault feature, the rectified linear unit is designed as an activation function. On this basis, the output of the convolution layer is designed as an input of the activation function to carry out nonlinear mapping, which facilitates fault feature learning. To prevent overfitting of the model, we added the batch normalization layer and the dropout layer. In addition, the max pooling layer is added after the convolution layer, and the output feature vector of the convolutional layer is reduced through the pooling layer operation to reduce the number of parameters.

In order to retain maximum data texture features and to accelerate the mode convergence speed, the max pooling layer is added after the convolution layer, and the output feature vector of the convolutional layer is reduced through the pooling layer operation to reduce the number of parameters. Before the full-connection layer, a mean pooling layer is used to ensure the integrity of data information and to avoid overfitting. Meanwhile, the dropout layer is also used to prevent overfitting. In order to optimize the model, the cross entropy is selected as a loss function, and the formula is as follows:where and are the total number of samples and categories, respectively. indicates the real labels. indicates the probability of calculating category .

3.2. The Principle of Bearing Fault Diagnosis

Aiming at a series of urgent problems, including few meaningful engineering data and a long feature extraction time by traditional diagnosis methods, a new diagnosis method based on the GMC-CNN is proposed. Figure 4 indicates the flow chart of the proposed method, which can be summarized in the following four steps:Step 1: an experimental platform to collect vibration signal data of fault bearing is set up;Step 2: the GMC is employed to denoise the acquired sequence data;Step 3: random interception of the denoising data is performed, whose length is equal to 4096. This procedure facilitates for data enhancement and makes up for the shortcoming of a few meaningful failure data and extends it to the sample set. The intercepted data is further divided into a training set and a test set;Step 4: importing the processed dataset into the GMC-CNN model for training. Moreover, the test set is also input into the trained model to validate its accuracy and efficiency in the fault identification;Step 5: numerical result acquired by theoretical methods in the literature and the GMC-CNN model are both indicated for a comparison. Naturally, the training time and prediction accuracy can be easily observed to evaluate the proposed method.

3.3. The Description of GMC-CNN Model Parameters

The model parameters in this study are divided into two parts, the first part is the GMC algorithm, and the second part is the 1DCNN model. Firstly, the tunable Q-factor wavelet transform is used as the dictionary term for sparse noise reduction of the fault signal. The matrix with is set to satisfy the convexity of . In general, the parameter is set between 0.5 and 0.8. Herein the parameter is determined. The in the GMC algorithm was adjusted by an adaptive strategy. The root mean square error and signal-to-noise ratio were both selected as the evaluation indexes, and the optimal is determined.

In the CNN model, the parameter adjustment mainly contains the size of the convolution kernel, selection of the optimization algorithm, selection of the loss function, size of the pooling layer, the learning rate, the number of batch samples, the number of iteration rounds, and the proportion of dropout method. The sizes of the convolution kernel and the pooling layer, as well as the proportion of dropout layers, are shown in Figure 3. The optimization algorithm selects the Adam algorithm, in which the learning rate and other parameters select the default value. The numbers of batch samples and iteration rounds are 128 and 30, respectively.

4. Validation of the Proposed Deep Learning Network Model

To verify the flexibility and utility of the constructed deep learning network model in bearing fault diagnosis, the public bearing dataset provided by the CWRU Bearing Data Center [26] is used. In this dataset, vibration signals are collected from a 2 hp electric motor, in which the accelerators are installed at the drive end and the fan end, respectively. Herein two sets of signal data are synchronously used and analyzed.

4.1. Dataset A: The Drive End Data

Three types of fault features, including inner race fault, ball fault, and outer race fault, always occurred in the drive end of the bearing. With full consideration of damage degree influences, damage sizes of 0.18 mm, 0.36 mm, and 0.54 mm for each fault type are considered. Therefore, ten health conditions, including one normal condition and nine fault conditions, are investigated in the dataset. Specific conditions are shown in Table 1.

In order to obtain enough experimental samples and to ensure segmentation data with a high randomness and universality, the sample signal length of 96000 is randomly intercepted by 500 data groups, wherein the length of each data group is equal to 4096. This procedure is a facility to ensure that the vibration sample contains enough feature information. Figure 5 indicates a comparison between original data and denoising data by the GMC method, which are represented by a blue line and red line, respectively. It can be seen that data essential information and fault characteristics are both retained to a maximum during the denoising procedure. In general, a higher fault prediction accuracy always needs more sample data, which undoubtedly increases the model training time, while the model training time can also be effectively guaranteed for the sparse enhancement data, even if the dataset increases to 20000.

The randomly segmented training set and test set are separately imported into the GMC-CNN model, and the number of iterations is both set to 30. Figure 6(a) indicates a comparison of the prediction accuracy, wherein the blue line and the red line are, respectively, the accuracy of the train set and the test set. The prediction accuracies in the training set and the test set are 100% and 99.98%, respectively, which may be caused by the effect of GMC sparsity enhancement combined with the amplified sample set. It can be determined that the problem of insufficient sample is solved when the low amplitude noise is considered. A further comparison of the loss rate is shown in Figure 6(b). The loss rate in the training set shows a high consistency with that in the test set. In addition, the proposed method has little sign of overfitting and accuracy degradation.

The confusion matrix is indicated in Figure 7(a). It is revealed that only one judgment error can be found. Figure 7(b) is a two-dimensional graph of ten health conditions in the dataset. The clustering phenomenon of each damage type in bearing faults can be easily discerned, and each clustering group of the data represents a fault feature. In other words, the fault type can be easily determined with a high precision by the GMC-CNN model if the data are clean and clear.

4.2. Dataset B: The Fan End Data

Herein, ten identical damage types in the driver end are also considered in the fan end. In other words, three damage types in the fan end combined with three damage sizes are considered in the example. Similar data processing procedure is indicated in Section 4.1 which is executed on the fan end data. The waveforms of the sequence signals before and after denoising are shown in Figure 8, wherein the red and blue lines represent vibration data and denoising data by the GMC method, respectively. It can be seen that the noise or interference components have been removed from the sampled signal.

As one of the most important evaluation indexes for the designed model, overfitting of the training samples has been paid special attention. Therefore, the average pooling layer and dropout layer are set up to prevent data overfitting during the study. It can be easily seen from Figure 9(a) that the overfitting is effectively controlled, and the training accuracy is maximum approximate to the test accuracy. So, when the iteration step is equal to 30, the prediction accuracy of the training set and test set are 100% and 99.97%, respectively. In addition, the loss rate of the fan end as shown in Figure 9(b) is small enough which can be can be ignored.

The confusion matrix is shown in Figure 10(a). A high fault prediction accuracy by the proposed method can be found, and only one error in the test set can be discerned. Figure 10(b) shows the distributions of ten damage types, and each color represents one type of damage. From the figure, each fault feature can be well clustered, which can be used for fault classification identification by utilizing the proposed method.

4.3. A Comparison of the Theoretical Methods

The traditional CNN model is shown in Figure 11(a), whose network structure is relatively simple. By employing a one-dimensional CNN model, it can be seen in Figure 11(b) that the fault recognition accuracies in drive end and fan end datasets are 96.79% and 96.11%, respectively. For a comparison, fault recognition by the proposed the GMC-CNN method is also shown in the figure. It is revealed that the GMC-CNN model shows a higher accuracy in bearing fault diagnosis, which effectively solved the problems of small samples and confusing data. In detail, the classification accuracy of the proposed method increases by nearly 3% compared with the traditional CNN method. In detail, prediction accuracies by the GMC-CNN method are increased to 99.97% and 99.98%, respectively. At the same time, the GMC penalty function not only retains the convexity of the cost function but also represents the sparsity characteristic. In addition, sparse enhancement of the data can liberate the depth of the CNN model and can save the training time.

To verify the data enhancement by the GMC layer, a 2-dimension CNN model is also used for data comparison. Figure 12 shows the structural framework of each layer. Two-dimensional sliced data, whose dimension is 64 × 64, are imported into an identical CNN structure. During the investigation, the performance of AMD Ryzen 5 3600 6-Core Processor with 3.60 GHz CPU and NVIDIA GeForce GTX 1050 Ti is employed in running python 3.8.

The GMC-2DCNN model is further studied for a comparison, where the GMC is also used for data-sparse enhancement and data noise-cleaning. Numerical results as shown in Figures 13(a) and 13(c) indicate that the prediction accuracies in solving drive end data and fan end data are, respectively, 99.97% and 99.85%. In addition, the GMC-2DCNN model has little overfitting phenomenon and identical accuracy can be found in the training set and the test set when the iteration step is equal to 15. Compared with the traditional one-dimensional model as shown in Figure 11(b), the accuracy of the model after data enhancement and data cleaning is significantly improved.

Figure 14 shows a comparison of precision and efficiency between the GMC-2DCNN model and the proposed GMC-CNN model. In Figure 14(a), the two methods represent little difference in accuracy. Herein the advantage of the proposed method is illustrated from the perspective of time cost, in which the suffixes DE and FE denote drive end data and fan end data, respectively. It can be seen from Figure 14(b) that the training time required by the GMC-2DCNN model in drive end data and fan end data is equal to 782 s and 785 s, respectively. While the training time required by the proposed GMC-CNN model in drive end data and fan end data is equal to 152 s and 151 s, respectively. It can be concluded that the proposed GMC-CNN model greatly improves the training time and reduce the operation cost. Moreover, the GMC used for data enhancement also improves the accuracy of fault classification to a certain extent.

For a further comparison of the solving accuracy, some other theoretical methods [2731], such as the D-CNN model [29] and the CNN-SVM model [30] are considered. Table 2 indicates the results of classification accuracy and calculation time. In the studies, the structural dimension of the CNN-SVM model and the D-CNN model are 20 layers and 11 layers, respectively. Training time and classification accuracy of the CNN-SVM in bearing fault diagnosis are 35.82 s and 98.75%, respectively. Fault classification accuracy of the D-CNN model has improved to a certain extent, which decreases the calculation efficiency to a great extent. In detail, fault classification accuracy is equal to 98.83%. However, training time increases to 216.32 s, which is more than six times of the CNN-SVM model. This may be attributed to the fact that the analysis data transformed by wavelet transform into a two-dimensional time-frequency graph undesired increases the train time. Therefore, the dataset processed into one-dimensional data, will improve the accuracy and reduce the time cost. To sum up, the proposed method has three advantages, first, it can sparsely enhance the data and expand the sample set; second, it has relatively low operation cost, reduces the training time, and can quickly respond to the fault type; third, it has high accuracy under the same condition of predicting the fault type.

In order to represent the accuracy more vividly, a bar chart is used as shown in Figure 15. It can be seen that the proposed GMC-CNN method improves the diagnosis speed and accuracy by taking advantage of the noise reduction ability in the GMC, which retains the convexity of the cost function and represents the sparsity characteristic, as well as the excellent feature extraction in the CNN. In detail, the accuracy of the proposed model during the training process is 99.98%, and the time loss is nearly 5 s in each iteration step. In other words, iteration steps 30 need about 150 s, which greatly improves the solving efficiency. In addition, the effect with CNN, as well as without consideration of the GMC, is also considered. In conclusion, the proposed deep learning method represents a higher accuracy than other models in fault identification and a low time cost.

5. Conclusions

In this study, a new sparse enhancement neural network is proposed to diagnose the bearing fault with a high accuracy and efficiency. First, the GMC is used to denoise the sequence signals and sparsely enhance the data derived from random signal slices. Second, the enhanced data are imported into the proposed model for training. Finally, the fault location and severity of rolling bearing are timely identified with a high accuracy. It is revealed that the diagnostic accuracy of the method reaches to 99.98% in the drive end and 99.97% in the fan end, respectively. By comparing with traditional methods (standard CNN model and GMC-2DCNN), the results indicate that the proposed model represents a better ability in fault feature extraction and training time reduction. Meanwhile, it also represents a good ability in identifying the fault with a high precision.

Data Availability

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also form part of an ongoing study.

Disclosure

This article was previously published as a preprint [32] at https://studys.ssrn.com/sol3/studys.cfm?abstract_id=4092632.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Yong Zhang wrote of the original draft, developed methodology, and visualized the study. Junjie Ye developed methodology and supervised the study and reviewed and edited the manuscript. Wenhu developed methodology and curated the data. Jinwang Shi visualized the study and curated the data. Wangpeng He conceptualised and supervised the study. Gaigai Cai validated the study and curated the data.

Acknowledgments

The authors would like to thank the National Natural Science Foundation of China, China (Nos. 52175112, 52075406, and 51805398), the National Natural Science Foundation of Shaanxi Province, China (No. 2018JZ5005), Fundamental Research Funds for the Central Universities (No. JB210421), and the 111 Project, China (No. B14042), for their support and cooperation.