Abstract

Aiming at the problem of early fault diagnosis of rolling bearing, an early fault detection method of rolling bearing based on a multiscale convolutional neural network and gated recurrent unit network with attention mechanism (MCNN-AGRU) is proposed. This method first inputs multiple time scales rolling bearing vibration signals into the convolutional neural network to train the model through multiscale data processing and then adds the gated recurrent unit network with an attention mechanism to make the model predictive. Finally, the reconstruction error between the actual value and the predicted value is used to detect the early fault. The training data of this method is only normal data. The early fault detection in the operating condition monitoring and performance degradation assessment of the rolling bearing is effectively solved. It uses a multiscale data processing method to make the features extracted by CNN more robust and uses a GRU network with an attention mechanism to make the predictive ability of this method not affected by the length of the data. Experimental results show that the MCNN-AGRU rolling bearing early fault diagnosis method proposed in this paper can effectively detect the early fault of the rolling bearing and can effectively identify the type of rolling bearing fault.

1. Introduction

As one of the key parts in rotating machinery, rolling bearing mainly plays a role in undertaking stress and transferring load in the system. Because of its long-term operation under high-speed, high-load working conditions, the rolling bearing has become the most easily damaged part of mechanical equipment [1]. Once the rolling bearing is damaged, it will have a very serious impact on the mechanical equipment, so it is of great significance to study rolling element bearings failure mechanisms.

The typical life curve of the rolling bearings is shown in Figure 1. There are four stages: (1) running-in stage, (2) normal operation stage, (3) early weak fault occurrence and healing stage, and (4) severe fault stage. The early faults are too weak to detect and once an early failure occurs, and the rolling bearings will deteriorate rapidly after a short period of the “healing stage”; it will lead to serious consequences. If the fault can be detected and remedied at an early stage, that would avoid bigger safety problems and reduce losses. Therefore, the early fault detection of rolling bearings is very important [25]. And there are two problems to be faced in the detection of early faults: (1) The early faults are too weak to detect, and it is more difficult to extract the features. (2) There are less early fault data, which is not enough to train the model.

The current methods for diagnosing rolling bearing faults can be roughly divided into two categories [6]. The first category is model-based fault diagnosis methods, which mainly uses expert knowledge to analyse the fault frequency [79] or establish a degradation model [10, 11] to isolate early faults. However, this method relies on the subjective choice of people and the accuracy of the model; it requires high experience. The model is only designed for specific fields, which limits the scope of application to a certain extent. The second category is fault diagnosis methods based on data [1214]. As the field of fault diagnosis enters the era of “big data,” a series of data-based methods have emerged. The data-based method relies on the neural network to extract the features of the data on its own, eliminating the artificial subjectivity and dependence on the human experience, which is more in line with the monitoring of today’s large-scale industry.

There are some statistical methods based to isolate early faults, mainly including principal component analysis (PCA) [6], autoregressive models [15], a neural network with multiple hidden layers [16, 17], support vector machine (SVM) [18, 19], K nearest neighbour (KNN) algorithm [20], and local outlier factor (LOF) algorithm [21, 22]. But these methods rely on well-selected features to classify faults.

In recent years, with the rapid development and wide application of deep learning; it has become the focus of fault diagnosis. There are some typical networks such as deep neural network (DNN) [23], deep belief network (DBN) [24], autoencoder (AE) [25], convolutional neural network (CNN) [26], and recurrent neural network (RNN) [27]. Although the accuracy of DBM and DNN is improved compared with shallow artificial neural networks, there is still the problem of artificial extraction of time series features, ignoring the characteristics of data timing. AE belongs to unsupervised learning, which is mainly used for data dimensionality reduction or feature extraction. It usually needs to be applied to the field of load forecasting after combining with other models. CNN is a neural network with convolution calculation and depth structure. Convolution and pooling are used to extract data features, which reduces the error caused by artificial feature extraction. It is widely used in image, voice, and other fields. However, it is difficult for a single CNN network to extract the weak features of early faults, so a multiscale convolution neural network is introduced to extract more comprehensive features. But it is difficult for an only MCNN model to learn the timing dynamics of retained data. RNN introduces the cyclic structure into the network so that it can model the dynamic time series data better than other neural networks [28]. Gated recurrent unit (GRU) is a special RNN. GRU and long short-term memory network (LSTM) [2931] are solving the problem of gradient disappearance in RNN. They can consider the long-term and short-term dependence in time series more completely. Compared with LSTM, GRU has a faster convergence speed and no difference in accuracy. However, when the input time series is long, RNN series networks such as LSTM and GRU are prone to lose sequence information and it is difficult to model the structure information between data, which affects the accuracy of the model [32]. The attention mechanism is a resource allocation mechanism, which can assign different weights to input features so that the features containing important information will not disappear with the increase of step size, highlight the influence of more important information, and make the model easier to learn the long-distance interdependence in the sequence [33].

However, although there are data-based early fault detection methods, early fault detection still faces the following challenges: (1) how to extract comprehensive and robust features from early fault signals; (2) consider the timing characteristics of bearing vibration signals to detect anomaly; (3) and when the data input length is too long, there is a problem of missing information.

Because of the above problems, this paper proposes an early fault diagnosis method of MCNN-AGRU. This method uses MCNN to extract the features of the rolling bearing at different time scales and filter out certain noise in the multiscale calculation to obtain more robust and comprehensive features of the bearing. The GRU network with an attention mechanism can learn the long-term dependence characteristics of the data, and the features containing important information will not disappear with the increase of the step size, thereby highlighting the influence of more important information, making the model easier to learn the long-term sequence. The interdependence of distance [34] solves the problem of information loss caused by too long data. Finally, a large amount of normal operating data of rolling bearings is used to construct a predictive model of the normal operating state of rolling bearings. The model can learn the distribution of normal data through training and use the learned prediction value and the reconstruction error of the true value to measure the operating state of the rolling bearing and perform early alarm.

The main contributions of this paper are as follows: (1) proposing an early fault diagnosis method that only needs to use the normal bearing data to train the model, which solves the problem of less early bearing fault data; (2) using multiscale data processing methods to make the features extracted by CNN more robust; (3) and using GRU network with an attention mechanism to make model predictive ability independent of the length of the data.

The main structure of the paper is as follows: the second part introduces the basic theoretical knowledge. The third part proposes the MCNN-AGRU method. The fourth part verifies the performance of the scheme through simulation. Finally, the conclusion is in the fifth section.

2. Fundamental Theory

2.1. Multiscale Data Processing

At present, most feature extraction methods directly use the raw vibration signals of rolling bearings to input into the neural network, but the features extracted from single time scale data are not comprehensive [35], so multiscale data processing methods are used. As shown in Figure 2, this paper uses a multiscale data processing layer to process the original vibration signal and obtains vibration signals of multiple time scales by moving without overlapping windows and calculating the arithmetic average. The data preprocessed in this way can filter out high-frequency disturbance and random noise to a certain extent. The specific operation of this method is as follows: a segment of vibration signal is given, is the length of the original input data, s is the number of multiscale processing scales, and is the nth vibration value of the original signal. If the multiscale output signal is assumed to be , the calculation process of the multiscale processed data is shown in

The data length after multiscale data processing is . The range of selected in this paper is 1∼4.

2.2. Convolutional Neural Network

The convolutional neural network is a multilevel neural network, including filtering level and classification level. Among them, the filtering stage is used to extract the features of the input signal, the classification stage classifies the learned features, and the two-stage network parameters are obtained through joint training [36]. The filter stage includes a convolutional layer and a pooling layer, and uses an activation function to perform nonlinear operations on it. The convolution layer uses the convolution kernel to perform convolution operations on the local area of the input signal and generate corresponding features. The most important feature of the convolutional layer is weight sharing; that is, the same convolution kernel will traverse the input once with a fixed step. Weight sharing reduces the network parameters of the convolutional layer and avoids overfitting caused by too many parameters. The main purpose of the pooling layer is to reduce the parameters of the neural network and extract the features obtained by the convolutional layer twice. The one-dimensional convolution process is shown in Figure 3. The convolution kernel moves the input signal according to the step length to extract the features, and then the obtained features are pooled to obtain more advanced features.

2.3. Gated Recurrent Unit Network

Gated recurrent unit network (GRU) is a variant of the recurrent neural network (RNN). RNN is a type of recurrent neural network that takes sequence data as input, recursively in the evolution direction of the sequence, and all recurrent units are connected in a chain [37]. As shown in Figure 4, the GRU network consists of an update gate and a reset gate. The main function of the update gate is to control the extent to which the state information from the previous moment is brought into the current state. The larger the value of the update gate, the more state information from the previous moment is brought in [38]. The main function of the reset gate is to determine the degree of discarding previous information. The smaller the value, the more information is ignored. The GRU expression is as follows:

In the above formula, represents the Sigmoid activation function. The parameters in the formula are , , and .

2.4. Attention Mechanism

The attention mechanism is a resource allocation mechanism that simulates the attention of the human brain. At a certain moment, the human brain will focus its attention on the areas that need to be focused, reducing or even ignoring the attention to other areas to get more attention. Needing to pay attention to the details of information and suppressing other useless information, its core idea is to change the attention to information ingeniously and reasonably, ignore irrelevant information and amplify the required information. The attention mechanism allocates sufficient attention to key information through probability allocation, highlights the impact of important information, and improves the accuracy of the model. The structure of the attention mechanism is shown in Figure 5. Among them, (t∈[1, n]) represents the input of the GRU network, (t∈[1, n]) corresponds to the hidden layer output of each input through GRU, (t∈[1, n]) is the attention probability distribution value of the attention mechanism to the GRU hidden layer output, and is the GRU output value of the attention mechanism introduced.

2.5. Support Vector Data Description

Support vector data description (SVDD) is a single-valued classification algorithm, which can distinguish target samples from nontarget samples. At present, the SVDD algorithm is mainly used for abnormal state detection and fault identification that only define the normal working state space to judge whether the working state is normal or not. Given a training sample , the goal of SVDD is to determine a hyperspherical body that can surround all training samples with a minimum volume. Assuming that a and R are the center and radius of the hypersphere, respectively, the SVDD optimization problem can be expressed as follows:

C is the constant used to control the degree of punishment for misdivided samples. is the relaxation factor. is the mapping from sample space to feature space.

The Lagrange operator is used to solve the above optimization problem, and the following dual form can be obtained:where is the Lagrange multiplier, .

In order to improve the adaptability of the algorithm, the Gaussian kernel function is introduced to replace the inner product operation on to improve the generalization ability of SVDD. The Gaussian kernel function is as follows:where is the Gaussian kernel parameter, which has a great impact on the detection performance of SVDD.

To solve the above maximum optimization problem, the solution set {} can be obtained; then, the center and minimum radius of the sphere can be obtained by the following formula:where is an arbitrary support vector.

For test sample Z, its thresholding algorithm is

When , the sample is the normal sample; otherwise, it is the abnormal sample.

3. MCNN-AGRU Early Fault Detection Method

Most fault diagnosis methods based on deep learning are learning and classifying the serious faults, but there are a few methods for the early fault of bearings. The MCNN-AGRU method proposed in this paper solves this problem. MCNN can extract data features of different scales to increase the number of data sets and filter out part of the noise in the process to extract more robust features. The GRU network with the attention mechanism can solve the problem of information loss and the difficulty of taking into account the relationship between data and information when a single GRU network inputs data with too long sequence. Therefore, the MCNN-AGRU early fault detection method proposed in this paper is improved compared with the previous methods in feature extraction and timing processing. The experiment proves that the early fault of the bearing can be detected accurately and quickly.

3.1. MCNN-AGRU Fundamental

The structure of the MCNN-AGRU model proposed in this paper is shown in Figure 6, which is mainly divided into three parts: the multiscale input layer, multiscale feature extraction layer, and prediction layer. First, the original vibration data is transformed into data of four time scales after multiscale preprocessing, as shown in Figure 7. Then, input the data of these four scales into the CNN network to extract the features, finally concatenate the features extracted from the data of the four scales to obtain the comprehensive feature, and then input it into the GRU network with the attention mechanism through the fully connected layer. The output is obtained after weighting.

The algorithm flow chart of the model is shown in Figure 8, including offline modelling and online monitoring. The offline modelling phase uses historical normal data to train MCNN and GRU network with an attention mechanism. When training the GRU network, GRU becomes a sequence generator, and the network outputs a prediction sequence with the same dimensions as the input data. During online monitoring, real-time data is input to the MCNN-AGRU model, and the forecast data at the next moment is output. The reconstruction error of the forecast data and the real data at the next moment is judged by the support vector data description (SVDD) to determine the current operating status of the rolling bearing. If the result is normal, continue monitoring; otherwise, perform an alarm.

Each layer in the model is described as follows:(1)Multiscale input layer: The input layer processes the original data through multiple scales to obtain four different scale inputs and inputs them into four different convolutional neural networks. The original data is .(2)Multiscale feature extraction layer: In this layer, two pairs of convolutional pooling layers are used for feature extraction for data of each scale, and the extracted features are connected in series to form a comprehensive feature. The input of the first convolutional layer is a signal of length , and a convolution kernel of length is selected to move on the data to extract features. Therefore, the output of the node in the feature graph is(i)where represents the weight matrix, is the bias, represents the subsignal of length starting from the -th period in the original data , and represents the activation function. ReLU activation is used here. The function can prevent the gradient from disappearing and speed up the function convergence. Sliding the convolution kernel from the beginning to the end, the -th feature can be seen as(ii)After that, the pooling layer is used to further extract the features obtained by the convolutional layer, the max-pooling with a pooling length of is adopted for calculating the local max value over the input feature map, and the k features are combined to obtain(iii)The features after the pooling layer are expressed as , and then the features are connected in series to get(iv)Finally, the features obtained from the four scales are connected in series to obtain comprehensive features:(3)Prediction layer: The prediction layer is composed of the GRU layer, attention mechanism layer, and output layer. The GRU layer learns the feature vectors extracted by the multiscale feature extraction layer. By building a single-layer GRU structure, the proposed features are fully learned to capture its internal changing laws. The output of the GRU layer is denoted as H, and the output at step t is expressed as

The input of the attention mechanism layer is the output vector H that has been activated by the GRU network layer. The probability corresponding to different feature vectors is calculated according to the weight distribution principle, and the better weight parameter matrix is continuously updated and iterated. The calculation formula of the weight coefficient of the attention mechanism layer can be expressed aswhere represents the attention probability distribution value determined by the output vector of the GRU network layer at time ; and are weight coefficients; is the bias coefficient; and the output of the attention layer at time is represented by . Finally, the input of the output layer is the output of the attention mechanism layer. The output layer calculates the output with a prediction step length of through the fully connected layer. The prediction formula can be expressed as

Among them, represents the predicted output value at time ; is the weight matrix; and is the deviation vector. The activation function is Sigmoid.

The reconstruction error is calculated as follows:

Only normal data is used to train MCNN-AGRU to make the model have the ability to predict the normal behaviour of the system along the time axis model in the process of detecting the early fault of the bearing. When the online data is input into the model, the model can predict the value of the next time of the data and calculate the reconstruction error with the actual value of the next time. The bearing has different reconstruction error when different faults occur. For example, when the system is normal, the reconstruction error is very small. When the system is abnormal, the reconstruction error will increase obviously. More importantly, the reconstruction errors of different types of early faults are also different. Therefore, we have reason to believe that the running state of the system can be judged by the reconstruction error. Various types of vibration signals and normal vibration signals are input into MCNN-AGRU to get the reconstruction error, and abnormal reconstruction error and normal reconstruction error are used to train SVDD to indicate the running state of the system.

4. Experimental Results and Analysis

This section verifies the accuracy and feasibility of the proposed MCNN-AGRU method through two sets of experiments on the self-built mechanical failure comprehensive simulation experiment platform and a full life cycle data set from the intelligent maintenance system (IMS) of the University of Cincinnati [39, 40].

4.1. MCNN-AGRU Fault Classification Experiment

This part mainly uses experiments to verify the accuracy of the model’s classification. The data set was acquired from the self-built mechanical failure comprehensive simulation experiment platform. This test stand consists of a motor, a rotor, a principle axis, a vibration sensor, and different kinds of rolling bearings (shown in Figure 9). The fault data set consists of four categories: normal state (N), the inner ring failure (IRF), the outer ring failure (ORF), and the rolling elements failure (REF). For the same fault, the degree is 0.2 mm, and the motor speeds is 1800 RPM. Digital data was collected at 12,000 samples per second.

This data set is used to evaluate the fault diagnosis performance of the algorithm. It contains four operating states common to the rolling bearing (N, IRF, ORF, REF). Each state has 120000 points, of which 80000 is selected for training data, 20000 for validation set, and 20000 for test set, and then test data of four operating states are entered into the trained model and judge the state of the system by the reconstructed error. The detailed information is listed in Table 1. For the proposed method, all structural hyperparameters are shown in Table 2.

Figure 10 shows the test results of the model. The black dots indicate the normal state, the green triangle indicates the rolling element failure, the blue square indicates the inner ring failure, and the pink cross indicates the outer ring failure. (1) The reconstruction error of the normal state fluctuates less than 2.5, and the reconstruction error of the abnormal state (rolling element failure, outer ring failure, and inner ring failure) is 2.5 to 20. It can be seen that there is a clear difference between the reconstruction error of the normal state and the abnormal state, which means that the model can distinguish the normal state from the abnormal state very well and has a good abnormality detection ability. (2) The reconstruction error range of rolling element failure is 2.5 to 5, the reconstruction error of inner ring failure fluctuates about 7.5, and the reconstruction error of outer ring failure ranges from 12 to 20. It can be seen that the model can distinguish three different types of faults well, indicating that the model has good fault classification capabilities.

4.2. MCNN-AGRU Fault Prediction Experiment

This experiment is mainly used to verify the fault prediction ability of the model. To verify the performance of the model extended to the early state recognition of the rolling bearing, it is first necessary to analyse the operating characteristics of the rolling bearing throughout its life cycle. This article uses the full life cycle data of bearings from the Intelligent Maintenance Center of the University of Cincinnati for analysis. As shown in Figure 11, the bearing test bench carries four bearings on a shaft, which is driven by an AC motor. The speed is maintained at 2000 r/min. A radial load of 6000 lbs is applied to the shaft and bearing through a spring mechanism to accelerate bearing aging. The oil circulation system can measure the flow and temperature of lubricating oil. Besides, the electromagnet installed in the oil return pipe will collect debris in the oil to prove the performance degradation of the bearing system. When the accumulated debris attached to the electromagnet exceeds a certain level, the system will stop running. A vibration acceleration sensor is installed on each bearing box. The data sampling rate is 20 kHz, sampling once every ten minutes. And there are 20480 points in each sample.

This paper chooses the data of experiment C in the IMS full life cycle experiment as the training set of this model to train the model. This experiment started on April 8th and ended on April 18th. After the accelerated aging test with applied load, the outer ring failure occurred on the 3# bearing. The data contains the vibration acceleration signals of the 3# bearing from normal operation to the occurrence of outer ring failure and contains 1399 samples in total. The data sampling rate is 20 kHz and each vibration signal snapshot length contains 20480 points. The first 800 samples are the healthy running data of 3# bearing. Select the first 500 samples of the sample file as the training data of the model, and the last 300 samples are validation data. The last 599 samples are used to test the performance of the model, and the last 599 samples contain the degradation process data of the 3# bearing. To ensure that the data input model has a certain physical meaning, 600 sampling points for roughly one revolution by calculating the sampling frequency and motor speed are obtained. Therefore, the data is rearranged and the data is input into the model for training and testing according to the cycle.

Figure 12 shows the performance of 3# bearing data in Experiment C. It uses normal data to train the model so that the model can learn the data changes of the rolling bearing in the health condition and use the reconstruction error between the actual value and the predicted value to measure the running state of the bearing. It can be seen from the partial enlargement of Figure 13 that the model first showed abnormal condition in the 8250th cycle. Then, the vibration signal returned to normal. This consists of the failure process of rolling bearings. When an early fault occurs to the outer ring and the rolling bearing is running, the weak defects in the outer ring will be smoothed by the continuous moving of the rolling elements. This abnormality will gradually diminish, so there will be short-term data similar to normal conditions. The rolling bearing with early fault will continue to run, and these two states will alternate. But the duration of the two states is getting shorter, and the amplitude of each abnormal signal will gradually increase.

To verify the stability and advancement of the MCNN-AGRU method proposed in this paper, this method is compared with several other fault detection methods.

As shown in Figures 14 and 15, it is clear that MCNN-AGRU can describe the development of the rolling bearing’s damage. It is very sensitive to initial anomalies than other methods through Figures 14 and 15. For Kurtosis, it is not sensitive enough to abnormal changes in the signal and about 6650 revolutions slower than the method proposed in this paper. When Kurtosis adds the MCNN, its detection ability is enhanced, but its ability to predict the next running state of the bearing is reduced. For the RMS, it has a certain response to the early fault, but it is not obvious, and it cannot accurately predict the next running state of the bearing. When the RMS adds the MCNN, both detection ability and the ability which predicts the next running state are reduced. Compared with RMS, the MCNN-AGRU proposed in this paper is obviously larger than that in amplitude. It means that when both methods detect early faults, the MCNN-AGRU’s response to early faults is more sensitive and obvious, while RMS is easily masked by noise. In conclusion, the MCNN-AGRU extracted data features are more stable and more sensitive to early faults.

5. Conclusions

The early fault diagnosis method of MCNN-AGRU rolling bearing proposed in this paper integrates the multiscale feature extraction of the signal and the GRU network considering the timing characteristics of the data to achieve end-to-end bearing fault diagnosis.(1)Compared with the traditional diagnosis method, the MCNN-AGRU method reduces the dependence on prior knowledge and experience, making the bearing fault diagnosis more intelligent.(2)Extracting features of fault signals through MCNN can well extract the comprehensive features of the data, highlighting fault feature information, and then the GRU network with attention mechanism will process the sequence characteristics of the book sequence, which is similar to traditional shallow neural networks that can retain the timing correlation of the input features so that the diagnosis results are more accurate.(3)Under different failure levels, through comparative analysis with Kurtosis, RMS, MCNN + Kurtosis, and MCNN + RMS methods, the MCNN-AGRU method is superior to other methods in the ability of early fault detection and the ability to predict the next running state running of the rolling bearing. It is proved that the method in this paper has high accuracy and good robustness.(4)Since the experimental data in this article is collected in a laboratory environment, there is a certain difference between it and the actual production environment. At the same time, the determination of the current network model structure largely depends on experience, and different parameters have a greater impact on the recognition effect of the network. In future work, we will continue to study the model structure setting strategy for rolling bearing fault diagnosis.

Data Availability

This paper verifies the accuracy and feasibility of the proposed MCNN-AGRU method through two sets of experiments on the self-built mechanical failure comprehensive simulation experiment platform and a full life cycle data set from the intelligent maintenance system (IMS) of the University of Cincinnati [1-2]: [1] W. Gousseau, J. Antoni, F. Girardin, “Analysis of the Rolling Element Bearing data set of the Center for Intelligent Maintenance Systems of the University of Cincinnati.” CM2016 2016 and [2] H. Qiu, J. Lee, J. Lin, “Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics,” Journal of Sound and Vibration, vol. 289(4), pp. 1066-1090, 2006.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was funded by National Natural Science Foundation of China Grant no. 52005352, Key Laboratory of Vibration and Control of Aero-Propulsion System, Ministry of Education, Northeastern University (VCAME202007), “Seedling Cultivation” Project for Young Scientific and Technological Talents of Liaoning Education Department (lnqn201908), and National Natural Science Foundation of China Grant no. 51905357.