Abstract
This study proposes a symmetrized dot pattern (SDP) characteristic information fusion-based convolutional neural network (CNN) fault diagnosis method to resolve issues of high complexity, nonlinearity, and instability in original rotor vibration signals. The method was used to conduct information fusion of real modal components of vibration signals and SDP image identification using CNN in order to achieve vibration fault diagnosis. Compared with other graphic processing methods, the proposed method more fully expressed the characteristics of different vibration signals and thus presented variations between different vibration states in a simpler and more intuitive way. The proposed method was experimentally investigated using simulation signals and rotor test-rig signals, and its validity and advancements were demonstrated using experimental analysis. By using CNN through deep learning to adaptively extract SDP characteristic information, vibration fault identification was ultimately realized.
1. Introduction
Vibration faults are a common problem for most types of power station equipment—rotary parts in particular, as the faults can significantly reduce the stability and safety of equipment operation. To reduce fault-induced loss, equipment state monitoring is necessary. One common method of monitoring equipment is vibration signal-based fault diagnosis [1].
The characteristics of a vibration signal can reflect the state of and faults in equipment [2]. Therefore, the timely and accurate analysis of vibration signals and the extraction of effective characteristic information can play an important role in fault diagnosis. In general, vibration appears nonlinear and unstable in complex rotary equipment setups, which can make vibration state monitoring difficult. Therefore, the analysis of nonlinear and unstable signals is critical for current research.
Vibration signal demodulation was conducted in this study using the Hilbert–Huang transformation [3]. Then, fault characteristic information was obtained from the time-frequency distribution of the signal by calculating its instantaneous frequency using the Hilbert transformation. Similarly, vibration signals were decomposed using time-frequency methods such as ensemble empirical mode decomposition (EEMD) [4], Hilbert vibration decomposition (HVD) [5], and variational mode decomposition (VMD) [6] in order to obtain each unimodal component of the vibration contained in the signals and then use the unimodal information to diagnose vibration faults.
In a study by Liu et al. [7], a resonant demodulation method was developed in order to extract the impact components in gearbox vibration signals and ultimately locate the gearbox faults. The key to this process was to accurately extract fault information from the original signals. However, the disadvantages of signal processing algorithms, namely, the illusive component and the end effect in methods such as empirical mode decomposition (EMD) and HVD, as well as noise disturbance on the spot, can distort characteristic extraction and negatively impact fault identification.
In addition, some other methods that visualize vibration signals of rotary machinery, such as the shaft center orbit [8] and symmetrized dot pattern (SDP) analysis [9], have been widely used for nonlinear unstable signal characteristic extraction due to their unique ability to present the characteristics of a vibration signal. For example, Jeong et al. [8] developed a rotary machinery fault diagnosis method based on deep learning of the shaft center orbit, in which the shaft center orbit of the rotary machinery served as the input of the deep learning model. This method served to enhance the diagnostic accuracy of traditional rotary machinery faults, but the need for pretreatment, such as centering and location, for the image identification of the shaft center orbit made the diagnosis model more complex. Similarly, Xu et al. [9] proposed an SDP and image matching-based real-time centrifugal fan stall detection method. Although the SDP method represents signal characteristics in a simple and intuitive way and effectively adapts to noise signals, it fails to fully describe the characteristics of complicated vibration signals with high nonlinearity and instability.
With the development of data mining and artificial intelligence, shallow machine learning-based fault diagnosis models, such as ANN, SVM, and fuzzy recognition [10–12], have been widely applied to the fault diagnosis of rotary machinery. However, diagnostic precision depends largely on the accuracy of characteristic extraction. Inadequate deep learning can also adversely affect diagnostic precision.
In recent years, adaptive learning of characteristics has been achieved through the emergence of deep learning [13]. In particular, the application of convolutional neural network (CNN) to vibration fault diagnosis has become the focus of research. Ince et al. [14] and Abdeljaber et al. [15] established fault diagnosis models based on 1D-CNN. Using the time series of vibration signals as input, they obtained the vibration state through signal characteristic learning.
CNN and image identification-based fault diagnosis techniques have also been developed. Zeng and Jie [16] developed a CNN-based signal time-frequency image identification method and an automobile gearbox vibration fault diagnosis model in which the shaft center orbit was demonstrated to be important in the visualization of the vibration signals of rotary machinery. In a study by Jeong et al. [8], rotary machinery fault diagnosis was studied via shaft center orbit image identification using CNN.
As mentioned earlier, CNN has exhibited strong performance in image identification and characteristic learning. Research findings have shown that the diagnostic precision of the CNN model can be further enhanced by taking into account the degree of difference between input images. To solve problems in traditional characteristic extraction methods and machine learning methods, an SDP characteristic information fusion-based CNN fault diagnosis model was developed for this study. Using the method, SDP images of the modal components of vibration signals were generated using the feature of SDP characteristic fusion in order to obtain images that demonstrated the characteristics of the vibration signal. Then, the characteristic images were used as the input for the CNN diagnosis model. The method can be used to improve the capability of vibration signal identification and identify the state of rotor operation faster and more accurately.
2. SDP Characteristic Information Fusion-Based CNN Fault Diagnosis
Characteristic differences can be demonstrated by the SDP images of vibration signals; however, vibration signals are complicated, often featuring high nonlinearity and instability. In addition, the artificial identification of image characteristics requires considerable expertise, and it is difficult to recognize small differences between images. To address these issues, an SDP characteristic information fusion-based fault diagnosis method was developed for this study. Using the method, SDP images with full expression of signal characteristics were obtained by fusing modal component signals. These images were then taken as the input of CNN in order to lay a solid foundation for CNN learning.
2.1. SDP Characteristic Information Fusion-Based CNN Fault Diagnosis Model
Modal component signals were fused based on information fusion features of the SDP method in order to fully demonstrate signal characteristics. Using the following steps, an SDP characteristic information fusion-based CNN fault diagnosis model was developed based on the superior performance of CNN in image identification: Step 1: signal components were obtained by decomposing the original signals using the HVD approach. To eliminate the interference of illusive components, illusive components were removed based on methods used in previous studies [17], such that only real modes of signals were retained. Step 2: information fusion of modal components was conducted using the SDP analysis method in order to obtain SDP images. Step 3: deep characteristic learning was implemented using SDP images as the input of the CNN model to identify the vibration state. Figure 1 shows the structural diagram.

2.2. HVD-Based SDP Characteristic Information Fusion
2.2.1. SDP Analysis
As a new signal processing method, the symmetrized dot pattern (SDP) converts the time domain signal into a polar coordinate system, and the mapping is represented by an SDP pattern, which can intuitively reflect the variation of signals’ amplitude and frequency in different fault states [9]. Time-domain vibration signals were transformed into points in polar coordinate space, as shown in Figure 2.

In the figure, is the radius of polar coordinates, is its counterclockwise deflection angle along the symmetry plane of the mirror in the polar coordinates, and is the clockwise deflection angle along the symmetry plane of the mirror in the polar coordinates, expressed as follows:where is the max amplitude of signal , is the min amplitude of signal , is the time-interval parameter, is the deflection angle of the symmetry plane of mirror, and is the amplification factor ().
Time-domain waveforms were transformed into images in the polar coordinates using the SPD analysis method in order to present vibration signal information through images. Compared with other image analysis methods, SDP can conduct the noise signals processing more effectively and also display the characteristics of different vibration forms more clearly. The characteristics of different vibration forms are mainly reflected in the following aspects on SDP images: (1) the curvature of the SDP image pattern arms; (2) the thickness and shape characteristics of the SDP image pattern arms; (3) the geometric center of the SDP image pattern arms; and (4) the SDP image pattern arms point concentration area. Signals were normalized before SDP analysis.
According to the analysis above, every mirror face of the SDP image was able to represent the characteristics of a set of data. Therefore, we inferred that multiple sets of vibration signals could be displayed in the same polar coordinates using the SDP method in order to carry out information fusion. Similarly, the nth order modal component signal obtained by modal decomposition of the original vibration signal can express the signal characteristics through the SDP image at the same polar coordinate. The sampling frequency and sampling point of the periodic sinusoidal signal are 10 kHz and 1000, respectively. The values of θ = 60°, ξ = 30°, and l = 1 are selected. As shown in Figure 3, the characteristics of six sets of vibration signals were fused and represented in one SDP image in the polar coordinates.

Modal decomposition of original vibration signals was needed before information fusion of modal component signals could be performed using the SDP method. Therefore, selecting an appropriate signal decomposition method was crucial for representing SDP image characteristics.
To avoid issues of mode mixing and amplitude distortion that are commonly seen in traditional signal decomposition methods and to eliminate the interference of illusive components, the original vibration signals were decomposed using the HVD method in order to obtain the modal information of signals and identify illusive components.
2.2.2. Signal Decomposition Method
In this study, the HVD signal decomposition method was used for original vibration signal decomposition. Specifically, an unstable continuous signal was decomposed into multiple components with different amplitudes [18] using the following steps: Step 1: the instantaneous frequency of the max component of amplitude was estimated. The following two-component unstable signal x(t) was used as an example: Assuming , instantaneous frequency was obtained using the Hilbert transformation, represented by Equation (3) indicates that consisted of two parts: the instantaneous frequency of the max amplitude component and the high-frequency oscillation component that varied with . The high-frequency oscillation component of could be removed in practice through the integral or the low-pass filter in order to estimate as the instantaneous frequency of the max amplitude component. In general, The actual signal contained more components than the simulation signal , and the expression of instantaneous frequency was more complex; however, the instantaneous frequency of the max amplitude component could still be extracted using the low-pass filter. Step 2: synchronous detection was conducted to obtain the instantaneous amplitude. Using the instantaneous frequency as a reference, the frequency , of the signal was multiplied by the two reference orthogonal signals to obtain the following equations: Filtering the second halves of equations (4) and (5) using the low-pass filter, the following equations were obtained: Thus, instantaneous amplitude and phase were calculated as follows: Step 3: the max amplitude component x1(t) was extracted following the above steps. The difference between x(t) and x1(t) was used as the new initial signal as follows: Components of different amplitudes were obtained by repeating steps 1 and 2. The normalized standard deviation of equation (8) was taken as the iteration termination condition σ < 0.001 [19].
2.3. HVD-Based SDP Characteristic Information Fusion Method
Experimental research was then conducted using simulation signals. Specifically, different vibration states were simulated using three simulation signals [20]: a multiperiod superimposed signal (25 Hz + 50 Hz and 50 Hz + 100 Hz + 150 Hz); nonlinear modulation signals (50 Hz periodic component + modulation signal); and impact signals (50 Hz + 100 Hz + 150 Hz + impact signal). In addition, a noise with a signal-to-noise-ratio of 50 was added to each signal. The signals are expressed in equations (9)–(11). The parameters in equations are shown in Table 1. The waveform diagram of each simulation signal is shown in Figure 4, and the results of the KL-HVD method of each simulation signal are shown in Figure 5.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)
The simulation signal was as follows:
The real modal components were observed in the HVD images. For example, the first two modal components of simulation signal x1 were real signals; the 25 Hz, 50 Hz, and 150 Hz modal components were observed in the simulation signal x2; the simulation signal x3 was decomposed into a periodic and amplitude modulation signal; and the impact signal x4 was decomposed into a periodic and impact signal. The real component signals obtained through the HVD method were plotted into SDP images with the selection of θ = 60°, ξ = 30°, and l = 10, as shown in Figure 6.

(a)

(b)

(c)

(d)
The figure shows that modal component information in the signals was successfully fused using the SDP method, which indicated that the original signal characteristics were expressed more sufficiently using this method. This result laid a good foundation for characteristic learning of the CNN model. It can be seen from Figures 6(a) and 6(b) that the SDP image pattern arms have similar shape features and the point concentration area for the periodic signal. However, the pattern arms corresponding to different frequency signals have different curvatures. Figure 6(c) shows that compared with those of the periodic signal, the differences of pattern arms curvature, the dot concentration area, the geometric center, and the shape features can be observed from the amplitude modulation signal. It can be seen from Figure 6(d) that compared with SDP images corresponding to impulse signals and SDP images corresponding to periodic signals, the dot traces are sparse and mainly concentrate in the region r = 0.4–0.6, and the curvature of pattern arms is small. In summary, the SDP images obtained by different signal mappings can clearly reflect the characteristics of different signals. Information contained in mode components of the signal can be successfully fused by using the SDP analysis method, thus fully expressing the characteristics of the original signal.
2.4. CNN-Based Fault Diagnosis Model
CNN was the first real deep learning algorithm to emerge with a multilayered structure. Essentially, CNN reduces the number of signal parameters based on relative spatial relations and learns multiple characteristic filters capable of extracting input data characteristics. Using these characteristic filters, along with input data, layer-by-layer convolution and pooling are performed to gradually extract the hidden characteristics in the data. As a deep-learning model structure that analyzes two-dimensional images (pixel matrix) [21], CNN offers the advantages of aggregating pixels by convolution in the convolutional layer to extract local image characteristics and conducting characteristic combination and dimension reduction in the pooling layer. These properties allow the CNN algorithm to accelerate characteristic extraction The CNN model structure is shown in Figure 7.

2.4.1. Convolutional Layer
In the convolutional layer (layer C), the input of every neuron is connected to the neurons (local receptive field) in a local adjacent area in the last layer in order to extract the data characteristics of this local area. In addition, the convolutional layer shares the same convolution kernel (shared weight) with a vector network, which reduces the complexity of the network model and significantly enhances the efficiency of network learning.
A vibration image can be viewed as a matrix of pixels, each ranging from 0 to 255 in grayscale. Assuming that the lth layer is the convolutional layer, the output eigenvector of this layer can be represented as follows:where is the set of input data signals; is the activation of the eigenvector in the layer; is the convolution kernel of the eigenvector in the layer and the eigenvector in the layer; “” is the convolution symbol; is the bias of the eigenvector in the layer; is the weighted sum of the eigenvector in the layer; and f(·) is the activation function of layer C.
2.4.2. Downsampling Layer
Since one image contains a large number of pixels, downsampling of different locations, represented through the following equation, was required to maintain scaling invariance of characteristics while ensuring data dimension reduction:where is the link weight, is the downsampling function, is the bias of this layer, and is the activation function.
2.4.3. Model Training
Similar to traditional artificial network training, CNN can also be trained using the backpropagation algorithm, which is commonly used in the supervised learning of neural networks to estimate the network parameters based on the output of a training sample [22]. Major optimal parameters of the algorithm include the convolution kernel k in the convolution layer, the weight coefficient β in the downsampling layer, the weight coefficient ω in the fully connected layer, and the bias of various layers, such as b, etc. A rule for network learning was thus derived by calculating the difference Ep between actual output Op and ideal output Yp so that the actual output would be closer to the ideal output in the network as follows:
3. Experimental Study
3.1. Bently-RK4 Rotor Dataset (BRD)
The vibration data of four types of rotor faults (imbalance, oil whirl, collision between rotary and static parts, and misalignment) were obtained in an experiment conducted on a Bently-RK4 rotor testing rig, as shown in Figure 8.

Each type of rotor fault contained 400 datasets (one set = 1024 data samples), of which 300 sets made up the training samples and the other 100 sets served as testing samples. In other words, the sample set contained 1600 typical sample data. The sampling frequency, sampling number, and rotary speed of the test platform were 1280 Hz, 1024, and 3000 r/min, respectively. All data were normalized.
For each type of fault, 100 datasets were selected to compose the testing samples (400 testing samples in total). Oscillograms of some samples are shown in Figure 9.

(a)

(b)

(c)

(d)
Second-order HVD of original signals was conducted in this section, and K-L divergence was calculated, as shown in Table 2.
To further identify the illusive components, the K-L divergence identification method obtained above was used to build a Gaussian mixed model (GMM), with the clustering result shown in Table 3.
The results showed that the illusive components of the model were accurately identified using this method. For comparison, SDP images with and without illusive components were drawn, as shown in Figures 10 and 11.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)
The comparison indicated that SDP analysis of the real components in the signal lowered SDP image complexity, accelerated the learning speed of CNN, and highlighted the signal characteristics. Therefore, SDP images fused with real component information were used as the input of the CNN model in this study.
The other data samples were processed using the same method to generate SDP images, which were taken as the input of the CNN model. In terms of the CNN network structure, the CNN used in this study contained two convolutional layers (C1 and C3), two downsampling layers (S2 and S4), one fully connected layer MLP, and one output layer. With the diagnostic precision and operating rate balanced, the sizes of convolutional kernels of the two convolutional layers were 5 × 5 and 3 × 3, respectively, and n1 = 6 and n2 = 12 were the values of the convolutional kernels. With a size of 2 × 2, mean sampling was used in S2 and S4. The number of iterations and batch size were 10 and 5, respectively. The Relu function was used as the activation function. The experimental results and diagnostic precision are shown in Table 4.
According to the experimental results, the diagnostic precision of the model reached 97.7%, indicating that the proposed fault diagnosis method was highly reliable. In addition, the results demonstrated that the characteristics of the original signals were more sufficiently and significantly expressed using the method based on the SDP analysis of signals.
3.2. Case Western Reserve University (CWRU) Bearing Dataset
In order to test the diagnostic effect of the research method in processing more complex signals, the bearing data at 12 kHz sampling frequency from the bearing data center website of CWRU (Case Western Reserve University) were used [23]. The 400 sets of samples of the rolling body, the inner and outer rings, and the vibration signal under normal conditions are selected as test datasets. The classification experiments are carried out using the proposed SDP characteristic information fusion-based CNN vibration fault diagnosis method.
It was found that when the number of sample points is 4096, the obtained image has fewer discrete points and the pattern arms are full. Thus, the sample length is determined to be 4096. The SDP images are drawn by the components obtained from the KL-HVD sixth-order decomposition of the original signal with the value of θ = 60°, ξ = 30°, and l = 2. The partial sample SDP images are shown in Figure 12.

(a)

(b)

(c)

(d)
The components are obtained from the KL-HVD sixth-order decomposition of the original signal, and the CNN-based model established in this paper is used to learn the image features of SDP obtained from the components. The 100 samples of each fault are randomly selected as the test set, which is trained by the CNN-based model with the recognition results being outputted. There are three types of fault conditions, i.e., inner race fault labeled as C2, ball fault as C3, and outer race faults as C4 with the healthy state labeled as C1. The classification results are shown in Table 5.
Conclusions can be obtained by the experimental results that the diagnostic precision of the model reached 93.7%, indicating that the fault diagnosis method proposed in this paper is also reliable for complex bearing signal fault diagnosis.
3.3. Comparison with Other Methods
The comparison of different fault diagnosis methods with the classification accuracies under different fault detection methods using the same benchmark datasets is shown in Table 6. The Bently-RK4 rotor dataset (BRD) established in this paper and CWRU dataset are selected as the test datasets.
It is shown in Table 6 that conventional intelligent fault diagnosis methods usually consist of feature extraction part and feature selection part. For example, EMD [25] and wavelet packet (WP) decomposition [24] are used to extract features of signals, and then principal component analysis (PCA) is used to extract feature vectors from high-dimensional matrices. For the selected feature vector, artificial neural networks, such as support vector machine (SVM), are usually used for classification.
In this paper, SDP image is used as the feature extraction method, and CNN is used to classify the selected features. The SDP method is able to fuse information features to carry out information fusion for each mode component signal (IMFs), and the superior performance of CNN in image recognition is combined to obtain higher recognition accuracy. For validation, the CWRU dataset of 100 groups of data is selected to be processed in different feature extraction methods, such as using wavelet analysis to obtain 2D time-frequency images of signals [27], or directly sampled from the original signal time-domain waveform and converted into a two-dimensional gray image [28] as the feature sample. All of these are input into the CNN model established in this paper for identification with the result shown in Table 6.
It can be concluded that when using the traditional intelligent fault diagnosis methods, the performance of diagnostic methods is largely determined by the feature extraction method and the performance of the classifier algorithm determines which needs fine tuning for different applications. Consequently, this problem will confine the general applicability of these methods. When using SVM to process the feature samples extracted by WP or EMD methods, due to the limitations of SVM classification algorithm, it is not only necessary to extract feature vectors from the signals processed by WP and EMD as the learning objects, but also the results have been limited to relatively small train/test datasets.
However, the CNN- or deep belief network- (DBN-) [26] based model can be used to learn more comprehensive classification features from extracted features and complete the classification task adaptively with a larger sample size which makes it more practical in general application.
In this study, the SDP image is used as the feature extraction method which can clearly show the fault features in the image. Meanwhile, the image form and features are easier to be learned and classified by the CNN-based model compared with the waveform and spectrum of the signal.
Considering the similarity of the learning process of CNN model on sample features with a “black box,” in order to further understand the differences when the original signal waveform images [28], time-frequency spectrums and the SDP images are taken as the signal feature samples. The T-SNE tool in the manifold learning method is introduced. T-SNE is a dimensionality reduction algorithm based on manifold learning, which can effectively realize the visualized dimensionality reduction of high-dimensional data by means of similarity modeling at data points. In the field of deep learning, it is often used to conduct the difficulty-understanding of CNN learning fault features through visualization. A total of 120 groups sample data of four bearing states (inner race fault, ball fault, outer race faults, and the healthy state) from the CWRU data center are selected as dataset. The time domain waveform images and the time-frequency images of the sample are obtained by the methods in reference [27, 28], while the SDP images are obtained by the proposed method. Then, the high-dimensional data of these different image forms in the training process of CNN model are reduced and visualized by using the manifold learning method. Results are shown in Figure 13. The sample data of different bearing states are represented by point clusters in different colors. The classification features of different images which are taken as signal feature samples are reflected by the mixing degree of point clusters. The lower the mixing degree of point clusters, the higher the classification features. Thus, it can be concluded that the CNN model can classify vibration signals more accurately.

(a)

(b)

(c)
It can be concluded from Table 5 and Figure 13 that the time domain waveform image characteristics of the original signal are too mixed, resulting in the chaotic visualization distribution, which is not conducive to the CNN model recognition. Thus, a lower recognition rate is obtained. However, with more obvious classification characteristics of the time-spectrum image, the recognition rate of the CNN model can be improved. The SDP image can fully fuse the signal feature information and visually express the characteristics of different signals. Therefore, more obvious classification features are easier to be learned by CNN, which can achieve higher recognition accuracy in large sample data learning.
4. Conclusions
An SDP characteristic information fusion-based fault diagnosis method was developed for this study to solve issues of high complexity, nonlinearity, and instability in original rotor vibration signals and bearing vibration signals. Specifically, original signals were decomposed using HVD to extract illusive components. Then, the characteristic information of real modal components was fused using the SDP analysis method to obtain the SDP images. Automatic vibration state identification was then realized using CNN-based deep characteristic learning.
The results showed that the characteristics of original vibration signals were more fully and intuitively represented using the proposed SDP characteristic information fusion-based fault diagnosis method, which laid a good foundation for CNN deep learning. Moreover, the CNN method was able to adaptively extract image characteristics and identify images accurately by means of deep characteristic learning. The combination of these traits was shown to improve the efficiency and accuracy of vibration fault diagnosis.
Through comparative experimental analysis, it is found that the proposed method has higher precision diagnosis results than the fault diagnosis method combining the traditional signal processing technology EMD with machine learning, the fault diagnosis method combining wavelet packet decomposition with machine learning, and the SDP characteristic information fusion-based DBN fault diagnosis method. At the same time, compared with other fault diagnosis methods in which the signal waveform and time-frequency spectrum are input as feature samples to the CNN model for feature learning, the research method proposed in this paper has better learning effect and higher recognition accuracy, further proving the effectiveness of the method proposed in this paper.
Moreover, the results of the experimental analysis evidenced the validity and the high diagnostic precision of the proposed method.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This paper was supported by the Fundamental Research Funds for the Central Universities (grant no. 2018MS111).