Abstract

Intelligent mechanical fault diagnosis has developed very fast in recent years due to the advancement and application of deep learning technologies. Thus, there are many deep learning network models that have been explored in fault classification and diagnosis. However, there are still limitations in research on the relationship between fault location, fault type, and fault severity. In this paper, a novel method for diagnosis of bearing fault using hierarchical multitask convolution neural networks (HMCNNs) is proposed, taking into account the mentioned relationships. The HMCNN model includes a main task and multiple subtasks. In the HMCNN model, a weighted probability is used to reduce the classification error propagation among multitasks to improve the fault diagnosis accuracy. The validity of the proposed method is verified on bearing datasets. Experimental results show that the proposed method is very effective and superior to the existing methods.

1. Introduction

Rolling bearings, as the key parts of mechanical equipment, are widely used in rail transit equipment, construction machinery, precision machine tools, instrumentation, and other fields. According to statistics, about 40% of rotating machinery faults are caused by bearing faults. Once bearing faults occur, they will seriously affect the normal operation of equipment, and they may even cause accidents and economic losses. Therefore, it is necessary to diagnose and monitor bearing faults before anything goes wrong [1, 2]. At present, bearing fault diagnosis is usually based on data-driven methods. By collecting motor current signals or bearing vibration signals, fault diagnosis methods are applied to complete fault identification [3, 4].

Data-driven fault diagnosis generally includes two steps: fault feature extraction and fault classification. The common methods of feature extraction include Fast Fourier Transformation (FFT) [5], Wavelet Transform (WT) [6], Empirical Mode Decomposition (EMD) [7], Local Mean Decomposition (LMD) [8], and Variational Mode Decomposition (VMD) [9]. The common fault classification algorithms include support vector machine (SVM) [10], BP neural networks [11], Bayesian classifier [12], K-Nearest Neighbor (KNN) [13], Random Forest (RF) [14], and Classification and Regression Tree (CART) [15]. Seryasat et al. [16] presented a diagnosis method based on wavelet transform and FFT to extract energy and root mean square of different frequency bands, which could accurately and effectively identify bearing faults. Yan and Jia [17] proposed a multidomain feature classification algorithm based on optimized SVM, which included three stages: multidomain feature extraction, feature selection, and fault recognition. The algorithm has high diagnostic accuracy for rolling bearings under different working conditions. Zhang et al. [18] proposed a new method of rolling bearing fault diagnosis based on Variational Mode Decomposition and compared the performance of VMD and EMD in extracting bearing defect features from rolling bearing simulation signals. The VMD method can accurately extract the main mode of bearing fault signal and is superior to EMD in bearing defect feature extraction. Liu et al. [19] presented a fault diagnosis method for wind turbine bearings based on integral extension local mean decomposition (IELMD), which could effectively process nonstationary signals. Kankar et al. [20] extracted the statistical characteristics of wavelet coefficients and completed the classification of bearing faults combined with an artificial neural network. Jiang et al. [21] proposed a fault diagnosis method for rolling bearings based on high-order cumulant and BP neural network in view of the fact that the vibration signals of rolling bearings were susceptible to the influence of Gaussian noise.

The basic steps of these traditional fault diagnosis methods can be summarized as follows: acquiring fault signals, analyzing the characteristics of fault signals, extracting appropriate features, and selecting appropriate classifiers according to the specific diagnosis problems. This process requires high professional knowledge and experience of signal processing for fault diagnosis personnel. With the development of modern industry, fault monitoring equipment obtains a large amount of data, and the data types are diverse, which brings great challenges to traditional fault diagnosis methods.

Deep learning has been widely used in recent years. The characteristic of the deep learning method is that it can automatically complete the task of feature extraction and classification [22]. Deep learning has also been introduced into the field of mechanical fault diagnosis to overcome the shortcomings of traditional methods recently. Zhang et al. [2325] proposed a fault diagnosis method for rolling bearings based on deep convolution neural networks (CNN), which avoided manual feature extraction and realized automatic feature learning. Shao et al. [26] proposed an enhanced depth feature fusion method for fault diagnosis of rotating machinery. A new depth autoencoder method was constructed by combining Denoising Autoencoder (DAE) with Contractive Autoencoder (CAE) to improve the learning ability of features. Jia et al. [27] proposed a deep normalized convolution neural networks (DNCNN), which could effectively deal with unbalanced classification problems. Liu et al. [28] proposed an unsupervised fault diagnosis method for rolling bearings based on the generative adversarial networks. This method has higher generalization accuracy under noisy and varied workload situations. These deep learning methods have been successfully applied to bearing fault diagnosis from different perspectives and application scenarios. Compared with traditional methods, they have higher diagnostic accuracy. However, the problem of low generalization ability of deep neural network model remains unsolved.

The fault diagnosis of bearing includes fault location, fault type, and fault degree. In the existing fault diagnosis methods, all kinds of samples are generally used as training samples of the training model to achieve fault diagnosis, and the relationship between them and the impact on the final fault diagnosis results are less considered. For the hierarchical classification of deep learning, Yan et al. [29] first proposed the hierarchical deep convolution neural network model to classify images. It first classifies the easily separated classes roughly and then classifies them at a fine level [28]. Based on this idea, Guo et al. [30] and Qu et al. [31] proposed a hierarchical intelligent fault diagnosis algorithm based on an adaptive deep convolution neural network model (ADCNN), which classified bearing fault location first and then classified fault severity. The design of this hierarchical classification model requires training multiple CNN recognition models, taking pretraining, and fine-tuning. It can thus lead to a cumbersome training process, more training samples, interlayer error propagation, and difficulty in model level expansion.

Therefore, a hierarchical multitask bearing fault diagnosis method based on the deep convolution neural networks is proposed in this paper. By adding multilearning tasks to the convolution neural network, the multitasks learning of bearing fault diagnosis is realized, and the generalization performance of the proposed model is improved. The main contributions of this paper are as follows:(1)Based on the CNN fault classification model, the HMCNN model is formed by adding several related classification tasks representing different dimension information for parallel auxiliary diagnosis. In the proposed model, final classification results are obtained by fusing the classification results of the main task and subtasks of different dimensions according to the weight obtained by training, which can reduce the interlayer error propagation of tasks.(2)The proposed model can extract more valuable features from fewer training samples for fault classification and improve the classification accuracy of the model. In the proposed model, multiple tasks can share network parameters and information, and only one network structure needs to be trained that reduced computational consumption and training complexity. The parallel structure of multiple tasks also has good scalability.

The remainder of the paper is structured as follows. The structure of the hierarchical multitask convolutional neural network (HMCNN) and some main techniques used in HMCNN is introduced in Section 2. In Section 3, experiments are carried out to prove that HMCNN has better performance than traditional intelligent methods and some typical deep learning models. After that, the structure of HMCNN model is extended to demonstrate its ability to extend. Lastly, the diagnostic results of HMCNN model are compared with CNN model and analyzed visually to explore its mechanism. The conclusion of this paper is presented in Section 4.

2. Proposed Method

In this paper, a novel method called hierarchical multitask convolutional neural network (HMCNN) is proposed for the intelligent fault diagnosis of bearings. The proposed model includes four parts: CNN model, hierarchical classifiers, multitask learning, and hierarchical multitask convolutional neural network. The HMCNN model only needs to train a network model to realize multitask classification, in which the sharing layer can reduce the number of network parameters, thereby reducing the computational load. The multitask learning, hierarchical classification, and joint classification layer design in the HMCNN model can improve network generalization ability. More details are described in the following sections.

2.1. Introduction to CNN Model

Convolutional neural network (CNN) is a kind of feed-forward multistage neural network. It mainly contains three kinds of layers: convolutional layer, pooling layer, and fully connected layer. The convolution layer is designed to extract different features of input data. The pooling layer following the convolutional layer is to reduce the parameters of the network through extracting the local mean or maximum value of input data. A fully connected layer is usually built in the last part of the hidden layer of the convolutional neural network. Its main function is to connect all features and send the output value to the classifier. Convolutional neural network (CNN) is one of the common deep learning models. It is used to extract features and classify vibration signals in bearing fault diagnosis.

2.2. Hierarchical Classification

The main idea of hierarchical fault diagnosis based on convolutional neural network is proposed in this paper, as shown in Figure 1. The main structure of hierarchical classification includes the sharing layer, coarse classification layer, fine classification layer, and joint classification layer, in which the sharing layer can reduce the number of network parameters and thus reduce the computation. The coarse classification layer is mainly used for coarse classification of bearing fault location, such as bearing recognition as health, inner ring fault, and outer ring fault. The fine classification is achieved through the fine classification layer. The joint classification layer which receives fine classification results as well as coarse classification results produces a weighted probability as the final classification results; it can be described as follows:where pc(xj) is the probability of coarse classification made by the coarse classification layer. pF(xj) is the fine classification made by the fine classification layer. N is the number of hierarchical tasks.

The coarse classification layer, fine classification layer, and joint classification layer are all based on softmax classification function for the final classification tasks. In this way, the output of the network is transformed into a probability distribution; the softmax function is described as follows:where is the logits of the j th output. n is the number of categories.

2.3. Multitask Learning

In this paper, bearing fault diagnosis has multiple learning tasks, such as fault location, fault type, and fault severity. From the perspective of machine learning, multitask learning can be regarded as inductive transfer learning, which can improve the learning performance of the model by using multiple related tasks, including improving generalization accuracy, learning speed, and comprehensibility of the learning model. In this paper, the related learning tasks are fault or fault location, which helps the final classification of severity tasks of fault categories. In the training process of the model, the joint training method is adopted, which combines the loss functions of multiple tasks and carries out the optimization training together. The loss function is described aswhere ki is the coefficient, lossi is the loss function for each task, N is the number of hierarchical tasks, m is the size of training minibatch, is the true predicted output value, and is the one-hot type vector with target distribution.

2.4. Hierarchical Multitask Convolutional Neural Network (HMCNN) Model

The HMCNN model is shown in Figure 1, which consists of the following four parts.

The first part is the sharing layer, which consists of two modules. The two modules are composed of 2 convolution layers and 1 pooling layer. The 3 × 1 size convolution kernel with stride of 2 is used in all convolutional layers. For pooling layers, the 8 × 1 sized max-pooling with the stride of 8 is done.

The second part is the coarse classification layer, which is connected to the shared layer. It consists of 2 full connection layers and 1 softmax layer, which are used to complete the classification of bearing fault location, fault type, and fault severity.

The third part is the fine classification layer, which is connected to the shared layer and consists of 3 convolution layers, 1 pooling layer, 2 full connection layers, and 1 softmax layer. It is used to complete the fine classification of bearing fault.

The fourth part is the joint classification layer, which receives fine classification results as well as coarse classification results and produces a weighted probability as the final classification results. The HMCNN model training parameters are shown in Table 1.

3. Experimental Verification

3.1. Data Description

Experimental data were collected from the bearing test rig of Paderborn University in Germany [32]. The experimental data were obtained by the test rig for condition monitoring of rolling bearings. The test rig consisted of several modules: an electric motor (1), a torque-measurement shaft (2), a rolling bearing test module (3), a flywheel (4), and a load motor (5), as shown in Figure 2.

The test bearing was ball bearings of type 6203. Bearings are run at a rotational speed of 900 rpm with a load torque of 0.7 Nm and a radial force on the bearing of 1000 N. The frequency of the data acquisition system is 64 kHz. The bearing temperature was kept roughly at 45–50°C. Three kinds of bearing states are used in this experiment: inner ring damage, outer ring damage, and healthy. The detailed situation of data is shown in Table 2. In Table 2, the bearing fault location, damage method, and fault severity are listed. For fault location, H is the bearing with no fault, IR is the bearing with an inner race fault, and OR is the bearing with an outer race fault. The damage methods of bearing are shown in Figure 3. The bearing damage used in this paper was caused by three different methods: electric discharge machining (trench of 0.25 mm length in rolling direction and depth of 1-2 mm), drilling (diameter: 0.9 mm, 2 mm, and 3 mm), and manual electric engraving (damage length from 1 to 4 mm).

As described in the document of the dataset, each bearing acquired 20 original vibration time-series signals, each of which recorded about 256,000 data points. In this experiment, the 2048 data points were used to construct a sample. For each health condition of bearings, 2000 samples were used in the training set and 500 samples were used in the test set. The vibration signals of each health state are shown in Figure 4. The experimental data are normalized by maximum and minimum normalization, and the normalization formula is as follows:where xi is the value of the i-th point of sample data. xmax is the maximum value of sample data. xmin is the minimum value of sample data.

In this paper, the proposed model is based on tensorflow deep learning framework. The experiment was completed on a computer with CPU i7 8700, 16 GB memory, and NVIDIA GTX 1070 GPU.

3.2. Diagnosis Results of HMCNN

The experiments are divided into three parts. The first part is a comparison among the proposed model, traditional method, and intelligent algorithm based on deep learning to demonstrate the superiority of the proposed model in terms of generalization performance. The second part is to extend and compare the network structure of the proposed model to verify the effectiveness of multitask learning. The third part is a comparison between the proposed model (HMCNN) and ADCNN model, studies the propagation of errors between model levels, and analyzes the reasons for the high recognition accuracy of the proposed model.

3.2.1. The Comparative Analysis of HMCNN Model with Other Models

In the first part, the proposed model (HMCNN) is compared with the commonly used fault diagnosis models such as support vector machine (SVM), backpropagation neural networks (BPNN), convolutional neural networks (CNN), and long short-term memory (LSTM). The inputs of SVM and BPNN are multidimensional features extracted after ensemble empirical mode decomposition. The inputs of CNN, LSTM, and HMCNN methods are normalized original signals. Experimental comparison results are shown in Figure 4. From Figure 5, it can be seen that the classification accuracy of traditional models such as SVM and DNN is less than 80%, while the classification accuracy of deep learning model such as CNN and LSTM is about 90%. The bearing fault diagnosis accuracy of HMCNN model reaches 99.7%. Compared with traditional bearing fault diagnosis models, it does not need to extract features, has higher diagnosis accuracy, and has better generalization ability than the current deep learning models.

The HMCNN model and CNN model are also compared and analyzed. The CNN and HMCNN models use the same optimization method and training parameters in this paper. The accuracy of per 50 steps in HMCNN and CNN model is shown in Figure 6. Figure 6 shows that the bearing fault diagnosis accuracy of HMCNN model is 99.7% and that of CNN model is 92.1%. Compared with CNN model, HMCNN not only has higher bearing fault diagnosis accuracy but also has fewer training steps to achieve the highest diagnosis accuracy. The confusion matrix of experimental results is compared as shown in Figure 7. From Figure 7, compared with CNN model, HMCNN model mainly reduces the confusion degree between outer ring bearing faults, so as to improve the diagnosis accuracy of bearing faults.

In the training processes, the learning speed of HMCNN model and CNN model is compared, as shown in Figure 8. The convergence rate (learning speed) of HMCNN model is about twice that of CNN model.

In order to further verify the generalization performance of HMCNN model, the diagnosis accuracy of HMCNN with other models under different training sets is also compared here. The comparison results are shown in Table 3 and Figure 9. As the number of training samples decreases, the recognition accuracy of all methods decreases to varying degrees. From Table 4, it can be seen that the HMCNN model has better recognition accuracy in fewer training sets, and the recognition accuracy can reach 96.7% in the case of only 500 training samples.

In addition, the performance differences between HMCNN model and SVM, BPNN, CNN, and LSTM model under noise are compared. Comparison results are shown in Table 3 and Figure 10. It can be seen that the diagnosis accuracy of HMCNN is 99.1% in noise environment (SNR = 10 dB), while the diagnosis accuracy of other models is not more than 90%. At the same time, the diagnosis accuracy of HMCNN model in noise environment (SNR = −2 dB) is more than 90%. So HMCNN has good antinoise performance.

3.2.2. The Comparative Analysis of HMCNN with Different Tasks Numbers

In the second part, we study and analyze the relationship between the hierarchical tasks’ number of HMCNN model and its diagnosis accuracy. According to the dataset, the HMCNN model is used to learn one, two, and three classification tasks (bearing fault location, fault type, and fault severity), which are named HMCNN1, HMCNN2, and HMCNN3, respectively. The comparison results are shown in Figures 11 and 12. The results show that both HMCNN3 and HMCNN2 model can achieve a high diagnosis accuracy. The accuracy of HMCNN2 model is 0.8% higher than that of HMCNN1 model. Compared with CNN, HMCNN contributes more to the accuracy improvement by adding the task of bearing fault location. In HMCNN3 model, the diagnosis accuracy of fault type reaches 99.7%, which shows that the task of bearing fault position, fault type, and fault severity diagnosis is effective and feasible, and the final diagnosis accuracy is improved to a certain extent. The comparison between HMCNN1, HMCNN2, and HMCNN3 models proves that the proposed model can be extended to diagnosis multiple tasks. In practical application, the location, type, and severity of bearing fault can be output in HMCNN model, which provides more detailed guidance for fault maintenance.

3.2.3. The Comparative Analysis of HMCNN Model with ADCNN Model

In the third part, the ADCNN model proposed by Guo et al. [30] is compared with HMCNN model. The idea of ADCNN model for bearing fault diagnosis is to identify the location of bearing fault first and then identify the fault severity of each location of bearing on this basis. Accuracy results of ADCNN and HMCNN models for diagnosis bearing fault locations and final fault severity are shown in Tables 5 and 6, respectively. We can see that the ADCNN model’s hierarchical diagnosis of the bearing will make the error of bearing fault location diagnosis spread to the result of bearing fault severity diagnosis, and the more the number of tasks, the more serious the error propagation. The HMCNN model has a shared layer and a weighted joint classification layer, which can solve the problem of error propagation and make the model more scalable.

3.3. Visualization Analysis

The principle of HMCNN model is further analyzed by t-SNE visualization technology. In this paper, the test set data are used as input, and the output data of the pooling layer in the HMCNN model are extracted as output. These output data are reduced to two-dimensional feature vectors by t-SNE, and then these outputs are plotted as scatter plots, representing their classes with different colors, as shown in Figure 13. The visualization results show that, with the increase of network layers of HMCNN model, the separation degree of features extracted from original signals becomes more and more obvious. At last, the output features of softmax classifier have seven distinct distributions (the final classification of bearing faults).

By comparing the output of two identical pooling layers of HMCNN and CNN models as shown in Figure 14, it can be seen that HMCNN has a better classification effect than CNN model. Compared with the second pool layer of HMCNN and CNN model, the classification of KI07 (Inner Race Fault 2) bearings, KA07 (Outer Race Fault 1) bearings, and KA08 (Outer Race Fault 2) bearings by CNN model is not obvious, but HMCNN model has been able to separate KI07 bearings obviously. This shows that, in the second pooling layer of HMCNN model, the fault location of bearing is well classified. The visual output of HMCNN and CNN third pool layer shows that CNN model has no obvious effect on the diagnosis of KA07 and KA08 bearings, while HMCNN model has an obvious effect on the diagnosis of KA07 and KA08 bearings. This proves that, in the third pooling layer of HMCNN model, the bearing fault severity is clearly classified. From the network of HMCNN, the second pooling layer is essentially the last layer of shared layer. In the shared layer, the proposed model learns the features of bearing fault location, fault type, and fault severity.

HMCNN model can share more fault information than CNN model through multitasks learning. In particular, the task of fault location focuses the network attention on the possible neglected fault location information, which enhances the classification ability of bearing fault location of HMCNN model and the KI07 bearing can be separated obviously by HMCNN model in the second pool layer. After shared layer, the HMCNN model only needs to recognize KA07 and KA08 bearings, but CNN model also needs to recognize KI07, KA07, and KA08 bearings, which may lead to the final recognition accuracy of the CNN model which is lower than that of the HMCNN model.

From the analysis of the learning process of HMCNN model, multitask learning in HMCNN model may improve the generalization accuracy of the model by using the information hidden in training signals of multiple tasks as an inductive bias. Multitask learning plays the same role as regularization and reduces the risk of model overfitting. At the same time, it reduces the ability to fit random noise and makes the model have better generalization performance.

4. Conclusions

The hierarchical multitask learning CNN model (HMCNN) is proposed, which reflects hierarchical classification. Only one model needs to be trained to achieve a multitask classification. Compared with the experimental results of other models, the HMCNN model can improve the accuracy of the final fault diagnosis, and the diagnosis accuracy reaches 99.7%. Compared with CNN model, the HMCNN model has a faster learning speed. We compare the diagnosis accuracy of HMCNN with other models in different training samples and noise environments. HMCNN model has better diagnosis accuracy than other models. It is proposed that the HMCNN model can be extended to diagnosis multiple tasks. The fault location, fault type, and fault severity of bearing fault diagnosis are given, which can provide more detailed guidance for fault maintenance. Compared with the ADCNN model, the HMCNN model solves the problem of error propagation and makes the model scalable.

Through the visual analysis of the HMCNN and CNN model learning process, the reason why HMCNN has higher generalization accuracy is further explored. HMCNN model shares more fault information than the CNN model. In particular, the task of fault location focuses the network attention on the possible neglected fault position information, which enhances the classification ability to bear fault location of the HMCNN model.

Data Availability

The data that support the findings of this study are available at https://mb.uni-paderborn.de/en/kat/main-research/datacenter/bearing-datacenter/data-sets-and-download/?tdsourcetag=s_pcqq_aiomsg. At the same time, the data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by National Key R&D Program of China (2017YFB1201201).