Abstract
In many real-world fault diagnosis applications, due to the frequent changes in working conditions, the distribution of labeled training data (source domain) is different from the distribution of the unlabeled test data (target domain), which leads to performance degradation. In order to solve this problem, an end-to-end unsupervised domain adaptation bear fault diagnosis model that combines Riemann metric correlation alignment and one-dimensional convolutional neural network (RMCA-1DCNN) is proposed in this study. Second-order statistic alignment of the specific activation layer in source and target domains is considered to be a regularization item and embedded in the deep convolutional neural network architecture to compensate for domain shift. Experimental results on the Case Western Reserve University motor bearing database demonstrate that the proposed method has strong fault-discriminative and domain-invariant capacity. Therefore, the proposed method can achieve higher diagnosis accuracy than that of other existing experimental methods.
1. Introduction
Rolling bearings are key components in heavy-duty machinery and manufacturing systems and have also been widely used in modern industries. However, unexpected bearing faults during long-term operations lead to large maintenance costs and safety losses [1]. In the past decades, machine learning and statistical inference techniques have been intensively studied and have become increasingly popular today due to their ability to process collected signals rapidly and efficiently and provide reliable fault diagnosis results without prior expertise [2–5]. Recently, with the development of deep learning, the performance of fault diagnosis has been remarkably improved. The excellent performance of various fault diagnosis applications is mentioned in [6–13].
Data-driven techniques for fault diagnosis generally assume that training and testing data are derived from the same distribution. However, in real-world applications, the distributions of training and testing data are often different from each other due to changes in the environment, working conditions, and bearing quality. Consequently, fault diagnosis systems suffer from large performance degradation.
A domain adaptation technique whose main research must focus on the adaptation of a learning model built in a source domain for different but related target fields is necessary to avoid such reconstruction efforts to address this challenge. Many studies in engineering areas have reported that domain adaptation, which includes image classification, natural language processing, object recognition, and feature learning, is beneficial and promising [14–16].
Domain adaptation has recently been introduced into the field of fault diagnosis, in which the fault diagnosis model parameters or input features are adjusted to compensate for the mismatch.
Zhang et al. [17] took raw vibration signals as inputs of a deep convolutional neural network with the wide first-layer kernel convolutional neural network (WDCNN) model and used adaptive batch normalization (AdaBN) as the algorithm of domain adaptation to realize fault diagnosis under different load conditions and noisy environments. Lu et al. [18] introduced a deep CNN model with domain adaptation for fault diagnosis; this model integrated the maximum mean discrepancy as the regularization term into the loss function of the model to reduce the cross-domain distribution difference. Zhang et al. [19] developed an adversarial domain adaptation model, which comprises a source feature extractor, a target feature extractor, a domain discriminator, and a label classifier, for fault diagnosis. Jian et al. [20] proposed a fusion CNN model that combines one-dimensional CNN (1DCNN) and Dempster–Shafer evidence theory to enhance cross-domain adaptive capability for fault diagnosis. Li et al. [21] presented a deep domain adaptation method for bearing fault diagnosis based on multikernel maximum mean discrepancies between domains in multiple layers to learn representations from the source domain applied to the target domain.
The main contributions of this study are as follows:(1)We propose an end-to-end approach that directly takes raw temporal signals as inputs and does not require time-consuming denoising preprocessing and separate the feature extraction algorithm(2)We combine RMCA with 1DCNN bearing for fault diagnosis in a unique domain adaptation pipeline, RMCA-1DCNN, which can learn fault-discriminative and domain-invariant features between domains(3)Extensive experiments on Case Western Reserve University (CWRU) bearing datasets demonstrate that RMCA-1DCNN achieves superior performance to that of existing baseline methods
The rest of the paper is outlined as follows. In Section 2, we discuss some necessary theoretical background on unsupervised domain adaptation and CNN. In Section 3, we present our RECA-1DCNN unsupervised domain adaptation fault diagnosis model. We report a broad experimental validation in Section 4. Finally, we provide conclusions in Section 5.
2. Theoretical Background
2.1. Unsupervised Domain Adaptation
Domain adaptation involves machine and transfer learning. In transfer learning, when the data distribution of the source (training data) and the target (testing data) domains is different but the two tasks are the same, this special transfer learning is called domain adaptation, which can be divided into two classes: supervised and unsupervised adaptation. If the source data have labels and the target data have no labels, then it is called unsupervised domain adaptation. Its formal definition is as follows:
Definition 1. (domain). A domain comprises two components: a feature space and a marginal probability distribution , where is a dataset, that is, .
Definition 2. (task). Task is the learning goal. A task comprises two components: a label space and a predictive function corresponding to the labels, i.e., . is also the conditional probability distribution, and .
Definition 3. (unsupervised domain adaptation). Given a labeled source domain dataset and an unlabeled target dataset , and are the numbers of samples of source and target domains, respectively. The eigenspaces of and (that is, ), label space (that is, ), and conditional probability distribution are assumed to be the same. However, the marginal probability distribution of the two domains, that is, , is different. Domain adaptation aims to use labeled in learning a classifier for predicting the labels of , where .
Learning strategies of domain adaptation can be roughly divided into two categories, namely, instance transfer and feature matching, to reduce the distribution divergence between domains. The former reweights the source domain data according to the shared information contained in the target data and then further analyzes the reweighted source data [22, 23]. Meanwhile, the latter either performs subspace learning by utilizing the subspace geometrical structure [24–27] or distribution alignment to reduce the marginal or conditional distribution divergence between domains [28–31]. With feature matching, some approaches based on deep neural and adversarial networks have demonstrated superior performance on domain adaptation benchmark datasets [32–36].
2.2. Deep Correlation Alignment
In the activations computed at a given layer of a deep neuron network, and are d-dimensional representations. and are covariance matrices of source and target features, respectively. According to [27, 37], and , are respectively, defined as follows:where P is the centering matrix [38]. Taking the source domain as an example, P is a matrix of , and its value is as follows:
We define the CORAL loss [27, 36] as shown in the following to minimize the distribution discrepancy between the second-order statistics (covariance) of the source and target features:where represents the squared matrix Frobenius norm.
The covariance matrix is a symmetric positive definite matrix and a Riemann space, in which Euclidean distance is suboptimal. distance is an approximate Riemannian metric, which can effectively capture manifold structures. Using the distance, the RMCA loss [39] is redefined aswhere and are the matrices in which and are diagonalized, respectively; and , , are the corresponding eigenvalues. The normalization term provides the loss independent from the size of the features.
2.3. Convolutional Neural Network (CNN)
CNNs have the characteristics of the local acceptance domain, shared weight, and spatial subsampling. A standard CNN comprises input, convolution, pooling (subsampling), fully connected, and output layers. We focus on 1DCNN because vibration signals are one-dimensional. Compared with two-dimensional data, one-dimensional representation is simple and intuitive, as long as the signal is regarded as an image with a height of 1. In 1DCNN, the forward propagation from a previous convolution layer to the input of a neuron in the current layer can be expressed as follows:where is a bias of the neuron at layer ; is the output of the neuron at layer ; and represents the weight of the kernel, which connects the neuron at layer to the neuron at layer . The pooling layer usually follows the convolution layer and samples the features based on the following sampling rules:where denotes the max or average pooling function.
After passing through multiple convolutional and pooling layers, the CNN can classify the extracted features through the fully connected and softmax layers and obtain the labels of the samples.
3. Fault Diagnosis Framework Based on RMCA-1DCNN
3.1. Riemannian Metric Correlation Alignment Loss
In unsupervised domain adaption fault diagnosis, the source domain data have labels, and the cross-entropy is defined aswhere for each sample, , is the ground truth label and is the network prediction.
Considering bearing fault diagnosis as a multiclassification problem, the final deep features must be sufficiently discriminative to train strong classifiers and domain invariance between domains. Minimizing the classification loss alone might lead to the overfitting of the source domain and reduce the performance of the target domain. Meanwhile, minimizing the RMCA loss itself is likely to degenerate features.
Therefore, we consider the cross-entropy loss in the source domain and the second-order statistical alignment of the given layer in the source and target domains for joint training with the two losses and define the final loss function as follows:where denotes classification loss on the labeled source domain data, denotes the metric of the second-order statistics between the source and the target, and the hyperparameter determines the strong confusion in the domains.
Considering the two kinds of losses together, the network not only learns good feature classification but also reflects the statistical structure of the source and the target and prevents overfitting. During model training, objective function (8) is minimized by gradient descent on . The final learned features are expected to work well on the target domain.
3.2. RMCA-1DCNN Fault Diagnosis Model
RMCA-1DCNN is proposed to solve the cross-domain learning problem in the bearing fault diagnosis area. As shown in Figure 1, a DCNN is used as the main architecture, and the model employs a domain adaptation layer following Riemannian metric correlation alignment loss before the classifier. The labeled source and unlabeled target data are fed into the RMCA-1DCNN model in the training process. Then, domain-invariant features of the raw vibration signals are extracted through the multiple convolutional and pooling layers. The distribution discrepancy is minimized at fully connected layers. Theoretically, correlation alignment can be performed at multiple layers in parallel. Empirical evidence [36, 37] shows that solid performance is obtained even if this alignment is conducted only once. As a common practice, correlation alignment is performed after the last fully connected layer. Joint training with the classification and the second-order statistic losses between the two domains in the given layer can adapt the learned representations in the source domain for application to the target domain. The domain-invariant features can be efficiently extracted to improve the cross-domain testing performance (Table 1).

3.3. Architectural Design of 1DCNN
Considering that the vibration signals of bearing collected by acceleration sensors are usually one-dimensional is reasonable, 1DCNN is used to process the vibration signals. In this study, 1DCNN is adopted to handle bearing fault diagnosis. The network structure comprises four convolutional and pooling layers, a fully connected layer, and a softmax layer at the end. The first convolutional layer uses wide kernels for feature extraction and high-frequency noise suppression. Small convolutional kernels in the preceding layers are used to deepen the network for multilayer nonlinear mapping and preventing overfitting [17]. The parameters of 1DCNN are detailed in Table 2. The pooling type is max pooling, and the activation function is ReLU. The ADAM stochastic optimization algorithm is applied to train the model to minimize the loss function, and the learning rate is set as 1e − 3. The experiments are conducted using the TensorFlow toolbox of Google.
4. Experimental Analysis of the Proposed RMCA-1DCNN Model
4.1. Data Description
The bearing fault data used for experimental validation were obtained from the Bearing Data Center of the Case Western Reserve University (CWRU) [40]. The data were collected from a motor driving the mechanical system under four different loads (0, 1, 2, and 3 hp) and three different locations: the fan end, the drive end and the base, and the sampling frequency, which includes 48 and 12 kHz. The four fault types of the bearing are normal condition (N), outer race fault (OF), inner race fault (IF), and roller fault (RF). Each fault type contains fault diameters of 0.007, 0.014, and 0.021 inch. Therefore, we have 10 fault conditions in total.
In this paper, the vibration signals of different fault locations and health states with a sampling frequency of 12 kHz at the driving end of rolling bearing are selected for experimental research. The detailed description of the datasets is shown in Table 2. Three datasets are acquired under three loads of 1, 2, and 3 hp. Each large dataset contains training and testing samples, and each sample contains 2,048 data points. Overlap sampling technique is used to increase the number of training samples. The training samples are then overlapped to augment data [17]. However, no overlap occurs among the testing set. Overall, each dataset comprises 6,600 training samples and 250 test samples of 10 health states.
4.2. Experiment Result
Source domain samples have labels, whereas target domain samples have no labels. Owing to the three domains, the experiments were conducted in six domain transfer scenarios, A ⟶ B, A ⟶ C, B ⟶ C, C ⟶ B, C ⟶ A, and B ⟶ A. Taking A ⟶ B as an example, dataset A is the source domain, and dataset B is the target domain.
Comparison methods: the proposed method is compared with several successful machine learning methods to verify the effectiveness of the RMCA-1DCNN model.(1)SVM(2)Multilayer perceptron (MLP)(3)Deep neural network (DNN) [10](4)WDCNN proposed in [17] and the domain adaptation capacity from the AdaBN(5)Adversarial adaptive model based on 1DCNN (A2CNN) [19]
(1)–(3) and (5) are methods that work with the data transformed by fast Fourier transform. (4) is a CNN-based method that works with the normalized raw signals.
For a fair comparison, we adopt accuracies reported by other authors with the same setting or conduct experiments using the source code provided by the authors.
A total of 10 experiments were conducted for each domain transfer scenario to reduce the influence of random factors. Experimental results of six domain scenarios are shown in Figure 2. In the domain-shift scenarios of A ⟶ B, A ⟶ C, B ⟶ C, and C ⟶ B, the training and test accuracies of each scenario reach 100%. Through the domain-shift scenarios of C ⟶ A and B ⟶ A, the training accuracy is 100%, and the test accuracy reaches 98%. All these results show that the domain adaptation performance of the proposed method is remarkable and stable.

(a)

(b)

(c)

(d)

(e)

(f)
The comparison with other methods is shown in Figure 3. The average performance of RMCA-1DCNN is better than that of A2CNN and six other baseline methods. RMCA-1DCNN also achieves the state-of-the-art average accuracy of domain adaptation in all domain transfer scenarios.

As shown in Figure 3, the performance of SVM, MLP, and DNN in domain adaptation is poor, and their average accuracy in the six scenarios is 66.63%, 80.40%, and 78.05%, respectively. Therefore, the sample distribution is different under varying conditions, and the model trained in one condition is unsuitable for fault diagnosis and prediction in another condition.
Compared with some recent methods, such as WDCNN (AdaBN) and A2CNN, our method achieves an average accuracy of 99.33%, which is evidently higher than that of all the baseline methods.
In five out of six shifts, that is, A ⟶ B, A ⟶ C, B ⟶ C, C ⟶ B, and C ⟶ A, the fault diagnosis accuracy of the proposed method achieves the state-of-the-art domain adaptation performance, and the first four domain shifts reach up to 100%. In the domain-shift scenario of B ⟶ A, the accuracy of the proposed method is close to the A2CNN method. This accuracy is 0.18% lower than that of the A2CNN method, which is far better than that of SVM, MLP, DNN, WDCNN, and WDCNN (AdaBN) methods. On this basis, RMCA-1DCNN can learn fault-discriminate and domain-invariant features and effectively solve the domain adaptation problem caused by different loads of bearing data.
4.3. Sensitivity Analysis of the Fault
For each type of fault detection, we introduce three evaluation indexes, namely, Precision, Recall, and F-Measure, to further analyze the sensitivity of the proposed RMCA-1DCNN method. In the multiclassification problem of fault diagnosis, Precision and Recall for each fault category c are defined as follows:where True Positive (TP) represents the number of faults correctly identified as fault category c, False Positive (FP) means the number of faults wrongly identified as fault category c, and False Negative (FN) represents the number of faults c incorrectly labeled as not belonging to c.
F-Measure is defined as a reference for diagnosis analysis, and the calculation method of F-Measure is as follows:
F-Measure denotes the geometric weighted average of Precision and Recall, with α as the weight. We set α to 1 which indicates that Precision is as important as Recall. When α > 1, Precision is important; meanwhile, Recall is important when α < 1. In this study, α is set as 1, and an F-Measure close to 1 leads to improved fault diagnosis performance. This evaluation method considers Precision and Recall. The highest F-Measure is 1. Precision, Recall, and F-Measure of each health state in the RMCA-1DCNN approach are shown in Table 3.
In Table 3, for the first type of the fault, that is, the rolling body fault size is 0.007 inch, the RMCA-1DCNN method has low Precision in the domain-shift scenarios B ⟶ A and C ⟶ A, which is 86% and 83%, respectively. Thus, approximately 15% of this kind of fault alerts in the two domain shifts are unreliable. For the fourth type of the fault, Precision of the proposed method in the domain shift of B ⟶ A is 93%, indicating that 7% of the samples are incorrectly classified as this fault category.
In Table 3, for the third type of the fault, that is, the rolling body fault size is 0.021 inch, Recall of the RMCA-1DCNN method in the domain-shift scenarios B ⟶ A and C ⟶ A is low at 80%. This finding indicates that 20% of these faults are undetected.
Similarly, F-Measure in domain-shift scenarios of C ⟶ A and B ⟶ A for the first type of the fault is 0.9091 and 0.9434, respectively. For the third type of the fault, F-Measure in domain shifts of C ⟶ A and B ⟶ A is all 0.8889. F-Measure in the domain shift of B ⟶ A for the fourth type of the fault is 0.9615. All the other F-Measure values of fault classes are 1.
In short, Precision, Recall, and F-Measure of the RMCA-1DCNN method are all high. Except for the first, the third, and the fourth fault types, the RMCA-1DCNN method divides all categories into the correct classes. These results suggest that the classification performance of the proposed method is considerably improved after Riemann metric correlation alignment.
4.4. Parameter Sensitivity
In this section, we study the hyperparameter α, which is a critical coefficient for cross-validation. A high value of α may force networks to learn oversimplified low-rank feature representations. Although this high value may lead to perfectly aligned covariances, it may not be useful for classification. Meanwhile, small α may be insufficient to bridge the domain shift. Three typical domain transfer scenarios, namely, B ⟶ A, C ⟶ A, and B ⟶ C, are selected. The results of α with different values are shown in Figure 4. Similar trends are observed in other domain-transfer scenarios. As shown in Figure 4, a range of can be selected to obtain better results than those of the best baseline methods. When the value of α is larger than 25, the accuracy rapidly decreases in the three domain shifts. The effectiveness and robustness of the proposed method are further verified.

Furthermore, the second-order statistics of the specific activation layer in the source and target domains belong to the Riemannian manifold. When the classifier is in the optimal state, the entropy on the source domain is minimized, and the entropy on the target domain is minimized because both domains are indistinguishable after the alignment [37]. Given that the target domain data are unlabeled, the entropy E on the target domain is defined aswhich is nothing but network predictions.
Figure 5 shows the plots of target entropy and diagnosis accuracies as on the domain transfer scenario of C ⟶ A. We can clearly see that when α = 17, the target entropy is minimal, and the diagnosis accuracy is the best. The minimal target entropy corresponds to the maximum performance on the target. Note that α corresponds to the best performance in the range of [10, 25], also proving that target entropy minimization is necessary and is insufficient for domain adaptation. Therefore, in the unsupervised domain adaptation of bearing fault diagnosis, it is verified that the selection of hyperparameter α is compatible with the data-driven cross-validation strategy when using the Riemann metric correlation alignment [37].

4.5. Performance under Noise Environment
In the realistic industrial environment, the vibration signals are easy to be polluted by noise. This section will discuss the diagnosis accuracy of the proposed RMCA-1DCNN method in the noise environment. In our experiments, for six kinds of domain-shift scenarios, the source domain data remain the same, and the noise is added only to the target data to enlarge the distribution gap between the source and the target. The added noise is an additive white Gaussian noise, and the signals are compounded with different SNR. The definition of SNR is shown as follows:where and are the power of the signal and noise respectively. By definition, the more noise the signal contains, the smaller the SNR value is.
Figure 6 shows the original signal of the inner race fault, the additive white Gaussian noise signal, and the composite signal of the two signals with 0 dB of the SNR value. The composite signal is seriously polluted by noise, and to distinguish the vibration features of the source signal visually is almost difficult.

To verify the antinoise performance of the proposed method, we test the RMCA-1DCNN method with noise signals ranging from −2 dB to 10 dB. The results are shown in Table 4. When the SNR value increases, the diagnosis accuracy increases; when the SNR value decreases, the diagnosis accuracy decreases. When the SNR is more than 4 dB, the accuracy rate easily reaches above 97%. Analyzing the reasons, we can know that the larger the SNR value is, the less noise there is in the composite signal, the less the fault features are affected by noise, and the better the model performance is. The smaller the SNR value is, the greater the noise in the composite signal will be, which covers most of the vibration signals, resulting in the lack of fault characteristic information and worse model performance. Furthermore, we think that when the gap between the source and the target is small, the effect of the RMCA-1DCNN method is better; when the gap is large, the effect of the RMCA-1DCNN method is general.
4.6. Network Visualizations
The features of the source and target domain test data in the last hidden layer are reduced to two dimensions and visualized using t-SNE dimension reduction technology to further understand the influence of RMCA-1DCNN on network training. Taking the domain-shift scenarios of B ⟶ A, C ⟶ A, and B ⟶ C as examples, the features in the last hidden layer of the source and the target are shown in Figure 7. As presented in Figures 7(a) and 7(b), the domain-shift scenario of B ⟶ C has no overlap between classes, and the distance between different classes is large, that is, the features are highly separable. Therefore, the experiment has achieved a test accuracy of 100%.

In the domain-shift scenarios of B ⟶ A and C ⟶ A, Figures 7(a) and 7(b) show that the individual rolling body fault size of 0.021 inch is wrongly classified near the rolling body fault size of 0.021 inch, and overlaps are observed in the signal features between the two classes. In other words, the model has insufficient discrimination for the two kinds of signals. Hence, individual samples may be misclassified. This result is consistent with the diagnosis accuracy of 98%. The excellent performance of the RMCA-1DCNN method in unsupervised domain adaptation of bearing fault diagnosis is verified using the t-SNE technique.
5. Conclusions
We designed an RMCA-1DCNN model in this study to solve cross-domain learning problems in the bearing fault diagnosis field. RMCA-1DCNN aims to extract domain-invariant features that bridge the cross-domain discrepancy while strengthening the fault-discriminative capacity between the two domains. The experimental results on CWRU bearing datasets confirm the superiority of the proposed method. In future work, we will attempt to apply correlation alignment at multiple layers of 1DCNN in parallel, possibly further improving the domain adaptation performance of the proposed model.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
A. J. conceived the study, participated in the research design, conducted the experiments, and wrote the paper. A. P. designed the methodology and reviewed the manuscript critically for important intellectual content. All authors read and approved the final manuscript.
Acknowledgments
This research was supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant no. 18KJB520050), the Natural Science Foundation of China (Grant no. 51805466), and the Natural Science Foundation of Jiangsu Province (Grant no. BK20181055).