Abstract

Because deep belief networks (DBNs) in deep learning have a powerful ability to extract useful information from the raw data without prior knowledge, DBNs are used to extract the useful feature from the roller bearings vibration signals. Unlike classification methods, the clustering method can classify the different fault types without data label. Therefore, a method based on deep belief networks (DBNs) in deep learning (DL) and fuzzy C-means (FCM) clustering algorithm for roller bearings fault diagnosis without a data label is presented in this paper. Firstly, the roller bearings vibration signals are extracted by using DBN, and then principal component analysis (PCA) is used to reduce the dimension of the vibration signal features. Secondly, the first two principal components (PCs) are selected as the input of fuzzy C-means (FCM) for roller bearings fault identification. Finally, the experimental results show that the fault diagnosis of the method presented is better than that of other combination models, such as variation mode decomposition- (VMD-) singular value decomposition- (SVD-) FCM, and ensemble empirical mode decomposition- (EEMD-) fuzzy entropy- (FE-) PCA-FCM.

1. Introduction

With the development of science and technology, aerospace equipment, industrial equipment, and other fields of mechanical and electrical equipment have become increasingly complex, intelligent, and integrated, so that the operating conditions and the working environment are becoming more complex and changeable. Therefore, accurate and effective fault diagnosis in complex equipment systems becomes an effective way to improve the reliability and safety of the systems and to reduce the maintenance cost [1]. Roller bearings as one of the most common components in mechanical systems and their operating conditions will directly affect the performance of the entire mechanical equipment [25]. Using vibration signals for roller bearings, fault diagnosis has become one of the commonly used ways in recent years. Analyzing the roller bearing vibration signals and extracting their characteristics effectively are very important and of practical significance because the vibration signals can reflect the state of the roller bearings and the quality of the feature extraction, which determines the accuracy of the fault diagnosis.

For signal feature extraction, many different traditional methods for vibration signal feature extraction have been presented, such as statistical analysis, wavelet transform (WT), and various mode decomposition models. In reference [6], different statistical indexes, such as mean value, kurtosis, and clearance factor, are employed to calculate the vibration signals, and they are regarded as the eigenvectors for assessing the degradation of slurry pumps by using vibration signals. Wang et al. proposed a method based on WT for gear fault diagnosis [7]. The vibration signals are decomposed into continuous statistical features on different scales by using WT. Because the dimension of the features is high, principal component analysis (PCA) is used to reduce the dimension of the eigenvectors. However, WT needs to select the wavelet function and the number of decomposition layers. As the vibration signals have nonlinear and nonstationary features, this is not a self-adaptive method. To overcome this drawback, empirical mode decomposition (EMD) [8] can decompose the vibration signals into series of intrinsic mode functions (IMFs) and a residual self-adaptivity. However, EMD has a mode-mixing problem. Ensemble empirical mode decomposition (EEMD) [9] can solve the mode-mixing problem self-adaptively by introducing Gaussian white noise and decomposing a complicated signal into IMFs. Many scholars use the EEMD and entropy combination models to extract the vibration signal features. Zhang and Zhou employed the EEMD to decompose the roller bearings’ vibration signals into some IMFs, and then fuzzy entropy (EE) is used to calculate the IMF entropy values; the extracted features are selected as the input of the support vector machine (SVM) for roller bearing fault diagnosis [10]. In reference [11], a method based on EEMD, sample entropy (SE), and SVM for fault diagnosis is developed, and the main purpose of this paper is similar to that of reference [11]; the only difference is that the fact SE replaces fuzzy entropy (FE).

EEMD cannot separate the vibration signals correctly for the closely located frequencies, but variation mode decomposition (VMD) [12] decomposes the signal into variation and nonrecursive modes. Because its essence is a number of adaptive wiener filter groups, VMD can separate two pure harmonic signals with similar frequencies. In [13], the roller bearings vibration signals are decomposed into some band-limited intrinsic mode functions (BLIMFs), and then singular value decomposition (SVD) is used to compute the eigenvalue of each BLIMF.

However, for some complex systems, the traditional feature extraction methods, regardless of self-adaptivity or not, are not enough to extract the sensitive features of all fault types due to the interaction of the external environment and the internal structure. Sometimes, several fault feature extraction methods need to achieve a certain effect for fault.

DBN, PCA, and FCM models are reviewed. Section 3 describes the experimental data sources, evaluation of the clustering effect, and the fault diagnosis methodology. The validation of the experiments is given in Section 4. Finally, the conclusion is given in Section 5.

2. The Theoretical Framework of DBN, PCA, and FCM Models

2.1. Theoretical Framework of DBN

DBN was proposed by Hilton and Salakhutdinov [14]. It is widely applied in object and speech identification and image classification. DBN contains input layer X, hidden layer (multilayer unsupervised restricted Boltzmann machine (RBM)), and an output layer. The network structure of DBN is shown in Figure 1.

The RBM is a classic energy-based model, which includes a visible and a hidden layer. The structure of RBM is shown in Figure 2, where vector and denote the visible and hidden layers, respectively. W denotes the connection weight values between the visible and hidden layers. For these layers, the connection is complete between the intercellular nodes, and there is no separate connection in each layer.

The invisible and hidden layers’ neuron values are binary variables, and the neurons’ numbers in the visible and hidden layers are I and J, respectively. and represent the status between the visible neuron and the hidden layer neuron. For a group of a specific combination (), RBM as a system with energy is listed as follows:where is the parameter matrix, denotes the connection weight values between the visible layer and hidden layer , and and denote the bias values of the visible layer and hidden layer . The joint probability distribution based on the energy function is obtained bywhere is called the partition function and the distribution function (the likelihood function) is the edge distribution of joint probability . For a given visible layer, each neuron in a hidden layer is independent. Therefore, the active probability iswhere is the sigmoid active function. For a given hidden layer, the active probability of the neuron node in the invisible layer is

RBM is the trained iterative way, and the purpose of the training is to learn the value of the parameter to the fitting when the training data is used, where the parameter θ can be obtained by finding the maximal log-likelihood function by using the training set (where N is the number of samples):

Update the parameters , , and according to the following equations using the contrastive divergence (CD) algorithm:where is the learning rate in the pretraining phase and and denote the mathematical expectation.

2.2. Theoretical Framework of PCA

The essence of PCA is to retain the coordinates of the main components as the new data space direction to achieve the goal of dimension reduction. Assuming , the dimensional matrix with samples, is the ith sample, where is the covariance matrix. is defined bywhere , is the unit orthogonal eigenvector of the , and is the eigenvalue. Therefore, the matrix is decomposed as follows:where denotes the first to ith PCs, is the residual space (RS), and hence, is the projection of in RS. C is the projection matrix.

The process of obtaining the projection matrix C by means of the covariance matrix R is called the modeling process. The number of PCs directly affects the merits of the model and the final fault detection and diagnostics. This paper uses the main component contribution rate method to select the number of PCS as follows:where is the percentage of the total variance explained by the first PCs.

2.3. Theoretical Framework of FCM

FCM is one of the most common clustering algorithms, based on the objective function to minimize the Euclidean distance between each sample and all clustering centers. Correct cluster centers and classification matrices should be used to meet the termination criteria condition constantly, and hence, the data samples with similar characteristics are clustered into a class.

For a given vector , the corresponding fuzzy classification matrix is and is the number of clusters. The clustering centers are , as mentioned above, and FCM is described as follows:where , , and m represent the number of samples, clusters, and weighted index, respectively. is used to determine the degree of membership of each sample for each cluster. The greater the value is, the greater the likelihood of the cluster also is. is the Euclidean distance between the point and clustering center point.

FCM converted the extreme value problems with constraints to unconstrained issues by introducing the operator :

Equation (11) is the objective function, and the necessary conditions for it to reach the minimum value under the following conditions are as follows:

The purpose of the FCM is to find the classification matrix and the clustering centers, which will minimize the value of the objective function to smallest. The procedure of FCM is as follows:

Step 1. Initialize the cluster center point number c, weighted index m, classification matrix , and iteration number l = 0.

Step 2. Calculate the cluster centers C according to Equation (12).

Step 3. Update the classification matrix A according to Equation (13).

Step 4. If , stop the loop, or else set l = l + 1, and return to step 2.

3. Data Source and Clustering Effect Evaluation

3.1. Data Source

The experimental data came from Case Western Reserve University Bearing Data [15]. Three faults (inner Race Fault (IRF), Outer Race Fault (ORF), and Ball Fault (BF)) with fault diameters of 0.18 mm (1 hp), 0.36 mm (2 hp), and 0.54 mm (3 hp) were employed in this paper. The sampling frequency was 12000 Hz.

Table 1 shows the working conditions which are under consideration in this study. In Table 1, “NR” represents the bearings with no faults. The fault diameters are 0.18 mm (1 hp), 0.36 mm (2 hp), and 0.54 mm (3 hp). A, B, and C represent the three datasets. Each subset contains ten types of roller bearings faults under different conditions. Each type of the fault dataset has 30 samples with 2048 points, and hence, different datasets A, B, and C have a total of 300 samples.

3.2. Clustering Effect Evaluation

The two indicators partition coefficient (PC) and classification entropy (CE) are applied to evaluate the quality of the clustering results [16]. Partition coefficient is defined aswhere denotes the membership value of the point in the cluster. The disadvantage of a PC is the lack of a direct connection to certain attributes of the data itself.(1)Classification entropy measures the fuzziness of the cluster partition only:

When the PC value is close to 1, it means that the effect of clustering is good; when the CE value is close to 0, it indicates that the effect of the clustering is better [16].

3.3. The Procedures of the Method Presented

The roller bearings vibration signal features were extracted by DBN, and then PCA was used to reduce the dimension of the eigenvectors. The first two PCs were regarded as the input of FCM for fault diagnosis. The procedures of the method presented are listed below:(1)Because the frequency spectra of rotating machinery can reflect how their important components are distributed with discrete frequencies, they can potentially provide useful information about the health and working conditions of the machine [17]. Therefore, fast Fourier transformation (FFT) is used to resolve the original vibration signal into a coefficient symmetrical matrix. As a result, the half coefficient symmetrical matrix is selected as the input vector for training DBN. Before DBN training, the input data is normalized to [0, 1].(2)Several hidden layers are used to extract the features of the vibration signals.(3)Reduce the dimensions of the features of the vibration signals in step 2 by using PCA. The first two PCs are regarded as the input of FCM for fault diagnosis(4)PC, CE, and classification accuracy were employed to evaluate the clustering performance of the different combination models, such as EEMD-FE-PCA-FCM, VMD-SVD-FCM, and DBN-PCA-FCM.

The detailed flow chart is shown in Figure 3.

4. Feature Extraction and Fault Diagnosis

In this section, various vibration signals in Table 1 are first preprocessed by FFT, and then DBN is used to extract the useful feature information through several hidden layers. The time-domain figure of the various original vibration signals is shown in Figure 4.

The ten kinds of vibration signals are difficult to distinguish. There are no obvious vibration patterns in the NR and IRF signals. Unlike NR and IRF signals, the BF and ORF signals have obvious vibration patterns because the bearing and outer race components experience a certain impact when the roller bearings are working. Compared with NR and BF signals, IRF and ORF vibration signals have fixed vibration periods in some unique frequency bands, and the self-similarity is high. Especially when the inner ring is fixed, the outer ring rotates with the roller bearings; hence, the vibration regularity in ORF signals is more obvious. IRF and ORF vibration signals have strong periodic regularity, but it is difficult to distinguish these vibration signals under different conditions. To mine the signal features, the DBN, VMD, and EEMD models are used to decompose the vibration signals and PCA is used to reduce the dimension of the extracted features.

4.1. VMD Decomposition

To decompose vibration signals effectively, the number of mode m in VMD should first be predetermined. When the value of m is too small, the decomposition of the mode cannot fully reflect the original signal with the time-frequency information, and therefore, the VMD decomposition cannot be achieved. A larger m produces a similar frequency for each BLIMF component, which may result in overdecomposition. Therefore, in order to select the appropriate m, we observe the center frequency of the signal to determine the m according to references [13]. The results of the center frequency under different modes m are shown in Table 2.

Here, it is shown that when dataset A is used, the center frequency in the IRF2 signals ranges from 0.0507 kHz to 0.3469 kHz especially when m = 5, for example, 0.2279 kHz in BLIMF3, 0.2980 kHz in BLIMF4, and 0.3469 kHz in BLIMF5. The center frequencies of these three modes are very close to one another. This indicates that the vibration signals are not decomposed effectively. The same also happens when m = 3 (like 0.0535 kHz in BLIMF1, 0.2145 kHz in BLIMF2, and 0.2979 kHz in BLIMF3). However, the decomposition results (m = 4) contain four frequency components which are separated well. Hence, the parameter m in VMD is selected as 4. The penalty factor is often set at 2000 [13]. The VMD composition results and its envelope spectrum for each BLIMF are shown in Figure 5.

As shown in Figure 5(a), the IRF2 vibration signals are decomposed into four BLIMFs components. The range of amplitude of each BLIMF is gradually increased; each BLIMF frequency is also increased. To further verify the effect of VMD decomposition, Figure 5(b) shows that the envelope spectrum of each BLIMF, and it can be seen from Figure 5(b) that the decomposition results contain IRF2 double fault frequency (164 Hz) and fault frequency (58 Hz) (58 Hz is the IRF2 signal fault frequency).

4.2. Feature Extraction Using DBN

Firstly, The FFT is used to transform the time-domain vibration signal to the frequency domain; here, we take an IRF2 signal for example. The result of the spectrum envelope analysis in the frequency domain is shown in Figure 6.

As shown in Figure 6(b), the IRF2 signal working frequencies mainly focus on 0–1000 Hz. Because the working frequency for the IRF2 signal is 58 Hz, the frequency is mainly focused on 58 Hz and the double frequency (164 Hz). This indicates that the frequency-domain signal contains useful feature information. Therefore, we use the FFT to preprocess the different vibration signals in the first step. Then, we use the DBN in this section to extract the feature. The number of input layer nodes is set at 1024 because each sample contains 2048 points and only half of the coefficient matrix is used after FFT decomposition before the DBN training procedure. The numbers of neural nodes for the second to fourth layers are set at 512, 256, and 128, respectively. The learning rate is 0.15, the momentum value is 0.65, and the number of epochs is 1200. After the vibration signals’ features have been extracted by each layer in DBN, reduce the dimensions of the feature vectors by using PCA. The results of the first two PCs for each hidden layer are shown in Figure 7.

Obviously, data, of the same fault type, are discrete in the first three layers, while there may be overlapping between data of different fault types in all the datasets. As the number of hidden layers increases, these scattered data points are more concentrated at one point and these data of different fault types are more separated from one another. As can be seen in Figure 7, from all datasets, the data points of the same shapes are more concentrated at one point and there is a clear separation between different fault data types in the final hidden layer as compared with that of the first hidden layer. For example, all NR signal data, which have a triangular shape, are concentrated (overlapping with each other), in dataset B in the final hidden layer, and they are, however, discrete in the first hidden layer.

The results of PC1 and PC2 through the final hidden layer when PCA is used are shown in Table 3, where “total” denotes the sum of all eigenvalues λ in Equation (9) and is the cumulative contribution rate calculated by the first two PCs. The two largest eigenvalues (λ1-λ2) in Equation (9), when PCA is used, are the first two PCs; the greater the λ is, the more useful the information contained in the corresponding PC is. The available number of PCs is often selected as 2 when the cumulative contribution rate is more than 80% [16]. In Table 3, it is up to more than 85% with different datasets, for example, 87.81% in dataset A. Moreover, with the increment of the number of PCs, the eigenvalues are decreased, and the first two PCs are often selected as the input of FCM for roller bearing fault diagnosis (as space is limited, only the first six PCs are shown in Table 3).

4.3. Fault Diagnosis and a Comparison Analysis

Before roller bearing fault diagnosis, the EEMD is also used to decompose the vibration signal into some IMFs. Some parameters in EEMD should be preset, such as Gaussian white noise amplitude mm, and the number of inserted white noise nn in EEMD, embedded dimension m, and similarity tolerance r in FE should generally be set before calculation for parameter nn in EEMD. If the additional noise is standard deviation and this is only a small part of the standard deviation of the input signal, then the remaining noise will result in less than 1% error. The authors suggest that the value of added white noise mm is usually fixed at about 20% of the standard deviation of the input signal [1824]. The parameter mm is set at 100.

In FE, the greater embedded dimension m allows more detailed reconstruction of the dynamic process. But too great an m value is unsuitable due to the need for too great , which is difficult to meet the general requirements and will bring about loss of information. m is often fixed at 2 [16]. Here, similarity tolerance r is often fixed at 0.1 – 0.25 ∗ SD. SD denotes the standard deviation of the original vibration signals [16].

For the FCM model, the parameter c = 4 is set, where c is the number of clusters. Meanwhile, the value of the termination tolerance . FCM is used to identify the different roller bearing faults, and the results of two-dimensional clustering with different datasets are shown in Figure 8. The symbol cc denotes each clustering center. PC1 and PC2 are horizontal and vertical coordinates [2530]

(1)In Table 4, the greatest PC value is 0.9959 with dataset C when DBNs are used, and the smallest PC value is 0.6580 with dataset A in EEMD. The smallest CE value is 0.0117 in DBN, but the greatest CE value is 0.7735 in EEMD.(2)As shown in Table 4, the results of PC in DBN are overall greater than those of VMD and EEMD, and the results of CE in DBN are smaller than those of VMD and EEMD. These results indicate that the clustering performance of the method presented is superior to that of VMD-SVD-FCM and EEMD-FE-PCA-FCM.

The BF3 and ORF3 samples in Figures 8(a)8(e) using the EEMD-PCA-FCM and VMD-SVD-FCM models are scattered randomly, but in Figures 8(f)8(h), these scattered data points are more concentrated at one point, and the data of different fault types are more separated. This demonstrates that the DBN has a good feature extraction ability.

To verify the clustering effect, the three indicators PC, CE, and classification accuracy are used to estimate and compare the method presented, namely, the EEMD-FE-PCA-FCM, and VMD-SVD-FCM models. The results of α (PC) and β (CE) are shown in Table 4. The values of PC and CE are calculated by , and in Equation (14), the greater the PC value close to 1, the better the effect of FCM. However, the smaller the CE value is close 0, the better the effect of FCM.

To demonstrate that DBN can extract the signals effectively, classification accuracy is used to compare the DBN-PCA-FCM, VMD-SVD-FCM, and EEMD-FE-PCA-FCM models. The corresponding clustering accuracy is shown in Table 5:(1)The greatest classification accuracy is up to 100% with DBN, and the lowest classification accuracy is 76.037%.(2)The overall classification accuracy of DBN is greater than that of the VMD and EEMD models, about 10%–20%.(3)For different vibration signals, particularly in dataset C, the accuracy is up to 100% in DBN. But it is slightly lower in VMD, for example, 23.3% and 60%.

The method presented can extract the vibration signals and diagnose faults effectively, and its clustering is superior to that of the EEMD-FE-PCA-FCM and VMD-SVD-FCM models.

5. Conclusions

A method based on DBN and FCM for roller bearing fault diagnosis is presented in this paper. Unlike many traditional feature extraction methods, the different roller bearing vibration signals are extracted by using DBN. To visualize the data, PCA is used to reduce the dimension of the eigenvectors. Then, the first two PCs are selected as the input of FCM for fault diagnosis, and the experimental results show that the feature extraction is better than that of the other models, such as the VMD-SVD/EEMD-FE-PCA combination model. The classification accuracy shows that the FCM clustering model can identify the roller bearing faults well under various conditions without data labels.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project no. T32-101/15-R).