Abstract
Rotating machinery refers to machinery that executes specific functions mainly relying on their rotation. They are widely used in engineering applications. Bearings and gearboxes play a key role in rotating machinery, and their states can directly affect the operation status of the whole rotating machinery. Accurate fault detection and judgment of bearing, gearbox, and other key parts are of great significance to the rotating machinery’s normal operation. A new fault feature extraction algorithm for rotating machinery called Improved Multivariate Multiscale Amplitude-Aware Permutation Entropy (ImvMAAPE) is proposed in this paper, and the application of an improved coarse-grained method in fault feature extraction of multichannel signals is realized in this method. This algorithm is combined with the Uniform Phase Empirical Mode Decomposition (UPEMD) method and the t-distributed Stochastic Neighbor Embedding (t-SNE) method, forming a new time-frequency multiscale feature extraction method. Firstly, the multichannel vibration signals are decomposed adaptively into sets of Intrinsic Mode Functions (IMFs) using UPEMD; then, the IMF components containing the main fault information are screened by correlation analysis to get the reconstructed signals. The ImvMAAPE values of the reconstructed signals are calculated to generate the initial high-dimensional fault features, and the t-SNE method with excellent nonlinear dimensionality reduction performance is then used to reduce the dimensionality of the initial high-dimensional fault feature vectors. Finally, the low dimensional feature vectors with high quality are input to the random forest (RF) classifier to identify and judge the fault types. Experiments were conducted to verify whether this method has higher accuracy and robustness than other methods.
1. Introduction
Rotating machinery often works in a high speed and heavy load environment, which is prone to failure and will cause very serious consequences upon failure. Therefore, real-time monitoring and fault diagnosis of the important parts of rotating machinery have great significance [1]. When a rotating machine fails, the defective parts will produce corresponding vibration signals during its operation. These vibration signals contain a lot of fault information. If they can be extracted effectively, rotating machinery’s faults will be diagnosed in time and effectively.
Time-frequency analysis methods such as fast Fourier transform (FFT), empirical mode decomposition (EMD) [2], and wavelet packet transform (WPT) [3] are widely used in feature extraction of vibration signals. But the vibration signal of rotating machinery has the characteristics of being nonlinear and nonstationary [2]. Fast Fourier transform (FFT) method is only suitable for the analysis of stationary signal; WPT is more flexible than WT and can choose the frequency resolution, but it is still not self-adaptive and still needs to set the basic wavelet functions and parameters in advance. Although EMD has good adaptability, it has some defects such as mode mixing and end effect. In addition to the above shortcomings, time-frequency analysis methods have strict requirements on operators’ knowledge and ability, which limits their application and improvement in vibration signal feature extraction.
Entropy-based methods have been widely used in the field of fault diagnosis in recent years because of their excellent performance in dealing with nonlinear and nonstationary time series. Among these methods, approximate entropy (AE) [4], sample entropy (SE) [5], fuzzy entropy (FE) [6], and permutation entropy (PE) [7] are commonly used. AE is very dependent on the length of the data, and it is easy to generate undefined entropy value. SE and FE have low efficiency and are really time-consuming, so they are not suitable for processing large amount of data signals. However, PE has shown its great advantages in the analysis of complex time series and the measurement of the complexity of nonlinear and nonstationary signals. Permutation entropy is a dynamic mutation detection method, which can locate the moment of the system mutation conveniently and accurately, and can amplify the small changes of signals. However, PE algorithm also has the problem of insensitivity to the signal amplitude, which may lead to the loss of key information. Azami et al. proposed the method of amplitude-aware permutation entropy (AAPE) [8] to effectively deal with this problem. The multiscale amplitude-aware permutation entropy (MAAPE) algorithm proposed by Chen et al. can extract information from the time series of different scales. However, this method still has defects, such as it can only process single channel signal, and the traditional coarse-grained method used by this method cannot extract the information completely enough. In view of these deficiencies, this paper proposes the Improved Multivariate Multiscale Amplitude-Aware Permutation Entropy (ImvMAAPE) algorithm to extract fault features, which can process multichannel signal and has a more scientific coarse-grained method.
The vibration signals of rotating machinery are usually mixed with noise and other ingredients, and it has nothing to do with the untreated complex vibration signals that only directly use the ImvMAAPE method. As a result, consider combining the ImvMAAPE method and the time-frequency analysis method “empirical mode decomposition” to remove the interference existing in the vibration signals, and extract the fault characteristics more completely. Empirical mode decomposition (EMD) has shown its excellent performance in processing nonlinear, nonstationary signals. It has strong adaptability and can overcome many shortcomings of the traditional time-frequency analysis methods, but it also has some problems, such as mode aliasing and residual noise. To solve these problems, researchers had put forward several modified methods including noise-assisted EMD [9], masking signal EMD (MS-EMD) [10], noise-assisted multivariable EMD [9], and uniform phase EMD (UPEMD) [11]. Among them, UPEMD can effectively make up for the shortcomings of EMD and has advantage in computational complexity. Based on these advantages, this paper uses UPEMD to preprocess the original signal. The original signal is decomposed into a set of intrinsic mode function (IMF) components. Then, the correlation coefficient of each component is calculated and the original signal to screen the IMFs has a larger correlation coefficient. These IMFs contain more useful and less irrelevant fault information, able to represent the original fault features more clearly. Then, the decomposed parts of IMF components were combined into the reconstructed signals, from which the fault features will be extracted with ImvMAAPE.
Because the dimension of feature vectors formed by ImvMAAPE is high and not all features can effectively represent the fault information, even the partial redundant features can affect the accuracy of fault diagnosis. Therefore, it is necessary to deal with the high-dimensional characteristics of these faults by dimensionality reduction. In this paper, the t-distributed stochastic neighbor embedding (t-SNE) method is used to reduce the dimensions of the initial feature vectors. T-SNE involves typical nonlinear manifold ways of dimension reduction algorithms; it represents the similarity between data points in the form of probability distribution and can make the close points in high-dimensional space closer in lower-dimensional space, make the distant points in high-dimensional space more distant in lower dimensional space, achieving the purpose of dimension reduction and getting the low dimensional feature vectors to be more separable. After the dimensionality reduction, the low-dimensional feature vectors will be put into a classifier for recognition and classification. In this paper, the Random Forest (RF) classifier [12] is selected. This classifier was proposed by Breimanl in 2001; it has the advantages of strong generalization ability, simple parameter setting, simple implementation process, easy operation, fast training speed, and high recognition accuracy, and it has been widely used in the field of fault detection.
In general, this paper proposes a new method of rotating machinery health detection. First, calculate the ImvMAAPE values of vibration signals preprocessed by UPEMD. Then, use the t-SNE method to change the initial high-dimensional feature vectors into low-dimensional feature vectors and put the low-dimensional feature vectors into the random forest classifier for recognition and classification. Through the gearbox fault detection and bearing fault detection, the results demonstrate the effectiveness of the proposed method. Compared with other methods, the superiority of this method was verified.
This article introduces the theory of the abovementioned methods in Section 2, verifies the proposed method’s function by gearbox and bearing data in Section 3, showing that the method in fault detection and classification recognition have an outstanding performance. Section 4, with the idea of comparative analysis, proves the superiority and the necessity of using ImvMAAPE to extract features, processing raw data with UPEMD and reducing data dimensions with t-SNE, giving a more complete demonstration of the advantages in the proposed method.
2. Theoretical Review
2.1. Basic Principle of UPEMD Algorithm
The EMD algorithm decomposes the signal according to the time scale characteristics of the data itself, and decomposes the input signal into a set of intrinsic mode functions that can represent the local scale characteristics of the time series and a residual quantity .
This method does not need to set any basis function in advance, and has obvious advantages in processing nonlinear and nonstationary data. It is suitable for analyzing nonlinear and nonstationary signal sequences.
In order to eliminate the residual noise effect generated by the EMD process, Deering proposed a method based on the masking signal empirical mode decomposition (MS-EMD) in 2005 [10], which uses sinusoidal time series for auxiliary interference to achieve the purpose of removing the residual noise, but the elimination effect is not obvious. Wang et al. [11] found that the residual noise could be minimized by trying all possible phases of the disturbance signal, and based on this, a 2-level uniform phase empirical mode decomposition (2L-UPEMD) method was proposed. This method can effectively reduce the effects of the mode aliasing and the residual noise of EMD method by decomposing the original signal into two intermodulated components. The multilevel UPEMD method determined the IMF1 obtained by 2L-UPEMD decomposition as the first component, the IMF2 was defined as a new signal, and the low-frequency IMF was obtained by recursive decomposition in the same method [11, 13]. Multilevel UPEMD method is used in this paper, and its specific implementation process, is described as follows:(1)Distribute the parameter , set the amounts of IMFs obtained by decomposition and the initial residual . Among them, is the phase number, assuming that phases are uniformly distributed in the interval of , using to calculate the kth phase;(2)Calculate and . In this case, std stands for standard deviation, stands for disturbance amplitude, the period of the disturbance signal corresponding to m is ;(3)Execute 2L-UPEMD algorithm to obtain IMF , namely, . Step (a) to step (e) are the detailed steps of this algorithm:(a)Distribute the parameter , and (b)Calculate the disturbance signal: (c)Two IMFs are obtained by using MS-UPEMD(d)Repeat steps b and c until k increases from 1 to (e)Get the required IMF1 and IMF2 through (4)Calculate the difference between the residuals and the IMF, then treat it as the new residuals; .(5)Execute steps 2 to 5 from m = 1 to until all IMFs are obtained.
In the multi-level UPEMD algorithm, the disturbance amplitude is usually selected within the range of [0.1, 1.0]. is the maximum phase value allowed by each IMF. According to the suggestions in the literature [11], parameters and were determined as 0.2 and 8, respectively.
The flowchart of the above steps is given Figure 1.

2.2. Basic Principle of Correlation Coefficient
The correlation coefficient in probability statistics is a statistical index to reflect the degree of correlation between the variables. In the field of signal processing, it can effectively reflect the correlation between the processed signal and the original signal. The correlation between the original signal and the false signal or the noise signal is small; on this basis, the effect of the signal processing can be judged. In this case, the correlation coefficient can be used to screen out some IMF components, which can improve data reliability and the calculation efficiency of the subsequent steps. is the correlation coefficient between the ith IMF component and the original signal, which is calculated as equation (2).where N is the total number of the signal’s data points, is the kth data point of the signal, is the mean value of the signal, is the kth data point of the ith IMF component, and is the average of all data points of the ith IMF component.
2.3. Basic Principle of ImvMAAPE
2.3.1. Permutation Entropy (PE)
Permutation entropy is widely used in the analysis of the complex time series to measure the complexity of nonlinear and nonstationary signals. It is a dynamic mutation detection method, which can conveniently and accurately locate the moment of the system mutation, and can amplify the small changes of signals. The calculation process of the permutation entropy algorithm is as follows:
Firstly, the m-dimensional reconstruction vector of the original time series is obtained through where m is the embedded dimension and d is the time delay.
Arrange the elements of the reconstructed vector in order of their size to obtain a set of permutations and their number , then calculate the probabilities of all permutations.
According to the principle of Shannon entropy [7], the expression of the permutation entropy can be derived as follows.
2.3.2. Amplitude-Aware Permutation Entropy (AAPE)
The permutation entropy only considers the order of the time series’ amplitudes, and it ignores the influence of the amplitude values of the elements and the influence of the elements with the same amplitudes in the corresponding time series on the permutation entropy. AAPE has introduced the relative normalized probability to improve the statistical rules of and take the mean and the deviation of signal amplitude into consideration, which makes AAPE more robust and flexible in the measurement of complexity.
Assume that the initial value of is 0, while the parameter t of the time series increases from 1 to N-m+1 step by step, update every time the order of appear:where is the adjustment coefficient, which is used to adjust the weight of signal amplitudes’ mean and deviation; it is usually set to 0.5. At this time, for the whole time series, is:
The corresponding AAPE is:
2.3.3. Improved Multivariate Multiscale Amplitude-Aware Permutation Entropy (ImvMAAPE)
AAPE algorithm is a univariate analysis method, which cannot be used for multichannel vibration signals. By introducing the theory of multi-dimensional embedding reconstruction into AAPE, the fault information of multichannel sampling can be utilized relatively fully. The p-channel time series is reconstructed as , where c is the channel number, and the reconstructed vectors are arranged in ascending the order as follows:
Then, according to AAPE algorithm, we can getwhere ,
We can get the expression of mvMAAPE through
Since the algorithms above only consider a single time scale and cannot represent the complexity of signals under different time scales, the concept of multiscale entropy is introduced here to granulate the time series, so that the algorithm can analyze the time series under different time resolutions.
For the scale factor and given p channel time series , we can get the multiple coarse-grained time series :
It can be seen that
The coarse-grained method adopted by mvMAAPE still fails to obtain the information contained in the time series completely. For example, when the time scale , this method only considers the coarse-grained sequence starting from while the coarse-grained sequence starting from is ignored in the calculation of mvMAAPE, resulting in the loss of some information, as shown in Figure 2. Extending to any time scale , the traditional coarse-grain method will ignore coarse-grained sequences containing key information, which will lead to insufficient analysis and thus affect the analysis effect. To remedy this defect, an improved coarse-grained method was adopted.

For the p-channel time series of length L, with the scale factor , the ath coarse-grained time series can be calculated by:where , the scale factor is , that is, there are different coarse-grained multivariate time series, the coarse-grained improvement process of the kth channel’s coarse-grained time series with scale factors of 2 and 3 is shown in Figure 2. The mvMAAPE values of time series corresponding to each scale factor and embedding dimension are calculated, then the mean of these values are defined as ImvMAAPE:
For the three key parameters of ImvMAAPE, embedded dimension m, time delay d, and adjustment coefficient , if the value of the embedding dimension m is too small, that will cause the reconstructed vectors to contain too little states, and if m is too big, that will leave the time sequence homogenized, losing its ability to reflect the time series’ tiny change as well as increasing the complexity of calculation. After making a comprehensive consideration according to the literature [14, 15], set the three parameters as m = 5, d = 1, and = 0.5.
2.4. Basic Principle of t-Distributed Stochastic Neighbor Embedding (t-SNE)
T-SNE is an improved algorithm on the basis of stochastic neighbor embedding (SNE) algorithm. Both of them are typical nonlinear manifold dimensionality reduction algorithms, and both express the similarity between data points in the form of probability distribution.
The steps of t-SNE algorithm are as follows:(1)For high-dimensional data sequence , calculate the conditional probability distribution between high-dimensional data points and . where is the variance of the Gaussian distribution centered on , which is determined by the given perplexity and the binary search.(2)Calculate the joint probability density according to and we obtain:(3)Assume that the low-dimensional space sample data sequence is L and get its initialization value :(4)Calculate the gradient of the sample data in the low-dimensional space: where is joint probability density, C is the cost function defined by KL distance: The calculation is based on one degree-of-freedom t-distribution.(5)Update the output: where n is the number of iterations, α is the learning rate, and m is the momentum factor.(6)Repeat steps (4)–(6) until the number of iterations n is satisfied.
2.5. Basic Principle of Random Forest Classifier
Random forest classifier (RF) was proposed by Breiman L. in 2001. It has the advantages of strong generalization ability, simple parameter setting, and fast training speed. Its implementation process is simple and the recognition accuracy is high. It has a wide range of applications in the field of fault diagnosis. The basic principle of RF is as follows:
RF integrates multiple weak classifiers, including many decision trees . Among them are the same distribution and independent random variables; after obtaining the prediction result of each decision tree, the final output result is determined according to the voting principle. represents the ith training sample, p is the number of eigenvalues of the ith training sample, use as the label of , then the training sample set of random forest classifier is expressed as . After n times bootstrap samples of T, n bootstrap subsamples are obtained. Then, a decision tree classifier composed of a group of decision tree models is obtained. The decision tree model is based on each subsample , and CART decision tree is generally used. Finally, the new test sample category can be obtained by the vote of n decision trees according to the principle of the largest number of votes. The classification decision expression iswhere is the indicative function. When the conditions in brackets are met, the value is 1, otherwise the value is 0; y is the target variable generated by the category label .
2.6. The Method Proposed in This Paper
In this paper, an improved mvMAAPE (ImvMAAPE) method is proposed, which has good feature extraction ability for nonlinear, nonstationary, and multichannel vibration signals generated by rotating machinery. By combining with UPEMD, t-SNE, and RF classifier, a new comprehensive fault detection method for rotating machinery is proposed.
The testing procedures are as follows:(1)Obtain the original vibration data of rotating machinery under various working conditions through experiments(2)Process the original vibration signals by UPEMD method and decompose them into multiple IMF components(3)Calculate the correlation coefficients between each IMF component and the original signal(4)Calculate the ImvMAAPE values of the reconstructed signal samples under different working conditions, obtaining high-dimensional feature vectors(5)Use the t-SNE method to reduce the dimension of high-dimensional feature vectors(6)After dimensionality reduction, the low-dimensional feature vectors are put into the RF classifier to obtain the final fault classification and recognition results
The flow chart of this method is shown in Figure 3.

The original vibration signals of the failure machinery are processed by UPEMD, obtaining a set of IMF components, and the correlation coefficients between them and the original vibration data are calculated. Calculate the ImvMAAPE values of the signal reconstructed by the selected IMF components with more fault information. Then, the t-SNE method is used to reduce the dimension of high dimension feature vectors, remove the interference and redundancy features, and obtain the sensitive low-dimension features. Finally, the reduced low dimensional feature vectors are input into the random forest classifier to get the fault recognition and classification results.
3. Experimental Analysis and Results
In order to validate the proposed rotating machinery fault diagnosis method’s excellent performance and its versatility for all kinds of rotating machinery, the data sets of gearbox and bearing are selected for validation, which are two typical examples in rotating machinery fault diagnosis research. This paper selected the gearbox data set provided by PHM association in an international competition held in 2009 [16] and the rolling bearing data set provided by Casey Western Reserve University [17].
3.1. Gearbox Fault Diagnosis and Identification
Firstly, use the PHM gearbox data set to conduct the experimental verification; this data set contains several composite faults’ data of gears, bearings, and shafts. The abridged drawing of the collection platform’s practicality picture and internal structure are shown in Figure 4.

Acceleration sensors are installed at both the input and output ends of the box to collect the dual-channel vibration signal data. The sampling frequency is 66.67 KHz, the sampling time is 4s, and each group of data is sampled at 266656 points. The 8 working conditions’ waveforms are, respectively, shown in Figure 5. It can be seen that the data are chaotic, and it is almost impossible to observe the characteristics of vibration signals in various working conditions with our naked eyes.

In this paper, the data of 8 working conditions of the helical gear meshing mode with 30 Hz rotation rate and high load double channel is selected and the specific conditions of each working condition are listed in Table 1. Because there are 266656 data points of each of the 8 working conditions, it is too much to improve the algorithm operation efficiency, so on the premise that no distortion is generated and the reliability of the data is basically unchanged, the original data are sampled down by 1/3. The length of a single sample is 2048 points, and 43 samples of each fault are taken. There is no overlap in the data of each sample, and a total of 344 samples were obtained; 200 samples and 144 samples were randomly selected as the training set and the test set, respectively.
In order to reduce the noise and interference signals in the original vibration signal, extract the signal components containing valuable fault characteristic information and highlight the inherent characteristics of the fault vibration signal; the original signal is decomposed by UPEMD method. Reference literature [11], and set the relevant parameters as follows: set the first IMF to be extracted as startmode = 1, set the number of IMF components to be extracted as numImf = 11, set the number of screening iterations as numSift = 10, set the maximum number of phases allowed by each IMF as maxPhase0 = 8, and set the nonnormalized amplitude of the auxiliary sine wave as ampSin0 = 0.2. After decomposition, a series of IMFs are obtained by calculating the correlation coefficient between each IMF component and the original signal, the noise component, and false component, or the component with low reference value generated by decomposition can be eliminated. Due to space constraints, only the vibration signal decomposition results of the second working condition, that is, the first fault state, are shown here, as can be seen in Figure 6. The correlation coefficients between each IMF component and the original signal were calculated, and the four IMF components with the largest correlation coefficients were selected from the data of each working condition and signal channel for signal reconstruction. These four components contain most of the fault information and have the most reference value. The remaining components have little correlation with the original data and are highly likely to be irrelevant noise components, so they need to be eliminated. Use the ImvMAAPE method to extract features from the reconstructed signal.

The vibration signal comes from the interaction and coupling between the vibration of each component of the gearbox and the ambient noise when the gearbox is working. When there is a fault, there tends to be a periodic pulse component, and the vibration signals formed by different fault states have different characteristics; these features are embedded in the vibration signal, whether the information contained in the vibration signal is fully used determines whether the fault feature extraction is sufficient, further speaking, determines the accuracy of the final fault identification and classification. Compared with some existing multiscale entropy feature extraction methods, the proposed ImvMAAPE method has the ability to process multichannel data. One important thing is that this method can obtain more complete coarse-grained sequences and overcome the defect of traditional multiscale entropy of losing key coarse-graining sequences. ImvMAAPE values’ error bar diagrams with 20 scales under 8 working conditions are shown in Figure 7. It can be seen that the error range of ImvMAAPE value is small. At many scales, there is almost no aliasing of the error range under different working conditions or even if there is the aliasing, it is very small. Even if the features are aliased at one scale, they can be distinguished at another scale, and this reflects the excellent feature extraction performance of ImvMAAPE.

ImvMAAPE produces high-dimensional feature vectors of 20 dimensions, which will greatly reduce the efficiency of the algorithm if it is directly imported into the classifier for recognition and classification, and not all features are useful, some of them are redundant and some of them even contain confusing information which can lead to misjudgment of the classifier. Therefore, it is necessary to reduce the dimensionality of the initial high-dimensional feature vectors. By comparing the ImvMAAPE values under various working conditions and scale factors in Figure 8, it can be found that the entropy values of different working conditions are different under various scale factors, and the entropy values of different scale factors in different working conditions differ in different degrees; at some scales, the entropy of each working condition overlaps greatly, which is not suitable for identification and classification, and even causes the misjudgment of the classifier. At some other scales, the entropy of each working condition has obvious difference, and the fluctuation range of entropy values is small; these entropy values contain a lot of valuable information. In addition, some working conditions are difficult to distinguish at one scale but can be well distinguished at another scale, which requires a process of evaluation and screening. In this paper, the t-SNE method is used for dimensionality reduction; it can extract and retain useful features, eliminate useless features, form low-dimensional feature vectors, and improve the separability of each feature. The visualized three-dimensional feature space formed after dimensionality reduction is drawn and is shown as an example in Figure 8(a); it can be seen that the features after dimensionality reduction by the t-SNE algorithm have good clustering characteristics, and the clustering center is relatively obvious; except for the outliers and confounding points, the samples of all kinds of working conditions are gathered together, respectively, and the sample clusters of different working conditions have a large interval and a high degree of discrimination. When the t-SNE algorithm is not adopted, three features need to be selected randomly for plotting, as shown in Figure 7(b). The discrimination degree of each working condition is not high, the aliasing degree is serious, and there is no clustering center. If these three features participate in classification judgment, it will not only reduce the classification efficiency but also produce interference and affect the classification results.

(a)

(b)
In this paper, the t-SNE method was adopted to reduce the dimension of the original feature vectors to 8-dimensional feature vectors. The new feature vectors were input into the RF classifier, one result of fault classification is shown in Figure 9 with the classification accuracy of 99.3%. This proved that this method could effectively identify and classify the faults of rotating machinery.

In order to avoid the interference of accidental factors, the experiment is repeated 20 times and the classification accuracy was recorded. The results are shown in Table 2. It is obvious that, out of the 20 trials, even the worst classification accuracy can reach 98%, and the best result can reach 100%. Then, refer to the standard deviation of accuracy; it can be proved that the fault identification and classification method of rotating machinery proposed in this paper has high accuracy and stability.
3.2. Bearing Fault Diagnosis and Identification
In order to verify the universality of the method proposed in this paper, select bearing data to verify the method. The bearing data set was provided by Casey Western Reserve University, the appearance and structure abridged drawing of the test bench is shown in Figure 10, and the two-channel bearing data are collected by the acceleration sensor installed at the 12°’clock position of the motor shell’s drive end and fan end. In this experiment, the motor has no load, the speed is 1797 RPM, and the sampling frequency is 12 KHz. The data set includes four working conditions: normal, inner ring fault, outer ring fault, and ball fault; in addition, the inner ring fault, outer ring fault, and ball fault have three fault degrees with 0.1778 mm, 0.3556 mm, and 0.5334 mm grooves processed by electric spark, respectively. There are 10 data types totally, marked as NM, IRF1, IRF2, IRF3, ORF1, ORF2, ORF3, BF1, BF2, and BF3, respectively. Table 3 lists the brief information of the bearings’ ten working conditions data.

Intercept 102400 data points of each fault type’s data, take 2048 points as one sample, forming 500 samples totally. Among them, 300 samples and 200 samples were randomly selected as the training set and the test set, respectively. The collected two-channel bearing vibration signal waveform is shown in Figure 11. It can be seen that there are differences between the waveforms of the original vibration data under various working conditions. These differences can be extracted by feature extraction and the fault state identification, and classification can be carried out.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)
Similar to the gearbox fault detection process mentioned above, the method proposed in this paper is used to extract bearing fault features, and then identify and classify the fault types. Calculate the correlation between each IMF component and its original fault signal obtained by the method of correlation analysis; the ten IMFs of the fan end channel signal generated by UPEMD of the bearing’s first working condition is shown in Figure 12, and the calculation results are shown in Figure 13. The first four IMF components with large correlation coefficients were selected for signal reconstruction, while the remaining IMF components with small correlation coefficients that may contain noise information and the redundant information were ignored. In order to limit the article space, only the representative fan-end channel data decomposition results and correlation analysis results of the inner ring fault with the first failure degree were displayed.

(a)

(b)

Calculate the ImvMAAPE values of the reconstructed signals and draw Figure 14. It can be intuitively seen that the characteristic value of entropy extracted by this method has a small spread range. The entropy values have a good distinction effect under different working conditions, and the separation of the entropy value is evident at many scales; even if the entropy value appears aliasing at one scale, it can be distinguished well at another scale. This is consistent with the analysis of the gearbox data.

After the fault features were extracted by ImvMAAPE, the t-SNE method was used to reduce the dimensionality of high-dimensional fault feature vectors. Through visualization processing, it can be found that, as shown in Figure 15(a), the differentiation degree among the sample clusters of various working conditions is good, there are very few sample points that deviate from their clusters, and most of the sample points can cluster together closely around their obvious cluster centers. This shows that the separability of samples is improved. In order to verify the necessity of dimensionality reduction using the t-SNE method, randomly select three features whose dimensions were not reduced and plot them in Figure 15(b). It can be seen that the same sample points diverge seriously, there is no obvious clustering center, and the separability is poor. Except for the two conditions of NM and ORF1, which have obvious differentiation, the sample points of the other conditions are mixed together. Their sample clusters appear large aliasing, which will greatly affect the subsequent recognition and classification accuracy of the classifier, and it is verified by experiments in the following paper.

(a)

(b)
Put the 8-dimensional feature vector processed by the t-SNE method into the RF classifier for recognition and classification. The classification results are shown in Figure 16, and the classification accuracy can reach 100%. In order to avoid accidental events, 20 experiments were carried out with this method and the results were recorded. At the same time, the detection method using the other four feature extraction methods proposed above was tested 20 times and the results were recorded. The results are shown in Table 4. By comparison, it can be found that ImvMAAPE can extract useful information from vibration signals more completely, so that the classification results are more accurate and stable. It can be seen that the rotating machinery fault detection method proposed in this paper also has excellent performance in bearing fault diagnosis; it can not only realize accurate classification of bearing fault types but also accurately identify and classify different fault degrees of the same fault type.

4. Comparison and Analysis
4.1. The Superiority Analysis of the ImvMAAPE
4.1.1. (Performance) Analysis of Feature Extraction Ability
In order to verify the superiority of the ImvMAAPE method proposed in this paper, calculate the ImvMAAPE values of 8 working conditions of the gearbox at 20 scales and the corresponding values of Multivariate Multiscale Amplitude-Aware Permutation Entropy (mvMAAPE), Multiscale Amplitude-Aware Permutation Entropy (MAAPE),Multivariate Multiscale Sample Entropy (mvMSE), and Multiscale Sample Entropy (MSE), and plot their consequences in Figure 17 to make comparative analysis. Compare the calculation consequence of ImvMAAPE and mvMAAPE; it can be found that although the average entropy values obtained by the two methods are roughly the same, the values of mvMAAPE at different scales fluctuated significantly. The various working conditions’ entropy values have a high degree of overlap, and it shows that the coarse-grained sequence acquisition algorithm adopted by ImvMAAPE can extract the characteristic information of vibration signals in various working conditions more completely. With more representative features, smaller entropy deviation, and more stable performance, ImvMAAPE can make it easier to distinguish different working conditions. Compare the multi-data channel algorithms with the single data channel algorithms; more specifically, compare ImvMAAPE, mvMAAPE with MAAPE and compare mvMSE with MSE. It can be seen that the entropy deviation of the multichannel algorithm is obviously smaller and more stable, and the entropy ranges have a lower overlap degree. This is because the single-channel algorithms only use one channel vibration signal, the characterization of fault characteristics is not complete enough, it is vulnerable to external interference, accidental factors have great influence, and it is easy to ignore the fault information that is not easy to be obtained by the sensor in this position.

(a)

(b)

(c)

(d)

(e)
However, the multichannel algorithm can comprehensively consider the fault information of multiple channels, it reduces the contingency, and the fault information extraction is more comprehensive, leading to the higher stability and robustness.
On the other hand, the T-SNE method is adopted to carry out dimension-reduction process on the feature vectors of the gearbox data samples extracted by ImvMAAPE, mvMAAPE, and MAAPE. Descend the feature vectors to 2 dimensions, and plot Figure 18; it can be intuitively found that the samples of the same working conditions represented by the features extracted by ImvMAAPE method have obviously gathered together is clear. Although the features extracted by mvMAAPE method can gather samples of the same working conditions together, they are relatively scattered and the sample interval of different working conditions is small. While the samples of various working conditions characterized by the MAAPE method almost gathered (were almost aliased) together, they cannot be distinguished effectively. This also intuitively reflects that the features extracted by the ImvMAAPE method have a great improvement in classification performance compared with those extracted by the MAAPE method; besides, the features extracted by ImvMAAPE also show excellent adaptability in subsequent dimension reduction.

(a)

(b)

(c)
Calculate the ImvMAAPE, mvMAAPE, MAAPE, MSE, and mvMSE values of the bearing data under 10 working conditions and plot them in Figure 19. It should be noted here that due to the inherent disadvantage of the sample entropy [18], it will generate many undefined SE values but the PE algorithm will not generate undefined values, which also reflects the reliability of the PE algorithm on the other side. By comparing and analyzing Figure 19, it is found that the improved coarse-grained extraction method and the multichannel data processing method can still show excellent performance, and they can extract better features. Moreover, the features extracted by ImvMAAPE showed the best stability and discriminability.

(a)

(b)

(c)

(d)
The feature vectors’ dimension of the bearing data samples extracted by ImvMAAPE, mvMAAPE, and MAAPE are reduced by the t-SNE method; Figure 20 shows the performance in distinguishing of the features extracted by different methods. The ImvMAAPE method still shows the best performance, which can prove the conclusion of the gearbox experiment above.

(a)

(b)

(c)
4.1.2. Fault Identification and Classification Effect Analysis
The use of the ImvMAAPE feature extraction method in the field of fault detection has shown its excellent performance. In order to give a further verification of it, use the four methods: mvMAAPE, MAAPE, mvMSE, and MSE to analyze the data samples of gearbox and gear, respectively, with the fault detection process proposed in this paper. After 20 experiments, the maximum, minimum, average, and standard deviation of classification accuracy were recorded, and the results were shown in Tables 5 and 6, respectively. Through comparative analysis, it can be found that the fault detection method using ImvMAAPE has the highest accuracy, and the standard deviation of the method’s experimental results is only 0.71, which is very low compared with other methods, indicating that the classification accuracy is very stable. Through Figures 21 and 22, we can intuitively see the accuracy and the fluctuation of each classification result of the five methods; it can be seen that the classification accuracy of ImvMAAPE is the highest and the most stable, and the classification accuracy of the entropies with multichannel is higher than with single channel, this also validates the previous analysis. In summary, the ImvMAAPE method proposed in this paper can extract useful fault features from signals more effectively and completely. The comprehensive fault detection method composed of this method and UPEMD, t-SNE, and RF classifier can realize very accurate fault diagnosis of rotating machinery.


4.2. The Necessity of Using UPEMD to Process Raw Data and Using t-SNE to Reduce Dimensionality
In order to verify the necessity of using UPEMD to process the original data and using the t-SNE method to reduce the dimensionality of the initial high-dimensional feature vectors, use the idea of controlling variables to design three fault detection methods: the complete method proposed in this paper, the method without UPEMD, and the method without t-SNE. The maximum value, minimum value, average value, and standard deviation of these three methods, respectively, are calculated 20 times, and two kinds of samples of the gearbox and bearing data sets are used. The results are summarized in the form of a bar graph, as shown in Figure 23. It can be found whether the method proposed in this paper lacks the process of processing the original signal by UPEMD or the process of reducing dimensions by t-SNE, the final classification accuracy and stability will decrease in both experiments of the two datasets, especially for the gearbox dataset with relatively noisier data collection environment and more complex working conditions, the lack of these two methods in the overall method can even reduce the accuracy of fault classification and identification by about 10%. Such experiment results have verified the analysis of the influence of UPEMD and T-SNE methods on the overall fault classification and recognition effect in Section 3 above. The reconstructed signals obtained by UPEMD have less noise interference and redundant information compared with the original signal, making the extracted features more valuable. The feature vectors reduced by the t-SNE method can reduce the feature redundancy and the feature confusion, as the same kind of fault features gather closely, and the fault features with different kinds are far away from each other.

5. Conclusions
In this paper, a new nonlinear analysis method has been proposed: the Improved Multivariate Multiscale Amplitude-Aware Permutation Entropy (ImvMAAPE). It realized the application of the improved coarse-grained method in fault feature extraction of multichannel signals. Based on this, a new fault diagnosis method for rotating machinery is proposed. Firstly, the UPEMD method is used to process the vibration signals of rotating machinery to obtain a series of IMF components. Then, the correlation coefficients between each component and the original signal are calculated, and the components with higher correlation coefficients were screened out to obtain reconstructed signals. ImvMAAPE values of the reconstructed signals were calculated to obtain high-dimensional feature vectors, and the t-SNE method was used to reduce their dimensionality. Finally, the low-dimensional feature vectors gotten by dimension reduction were put into the RF classifier for recognition and classification. According to the fault detection results using the gearbox and the bearing data sets, we can see the identification accuracy of the method proposed in this paper can be up to100%. In the fourth section of this paper, mvMAAPE, MAAPE, mvMSE, and MSE are selected to compare with the ImvMAAPE method, which proved that the ImvMAAPE method can make full use of the information contained in the multichannel data, and can fully extract the information contained in the coarse-grained sequence and achieve better feature extraction effect. After the dimension reduction, the extracted features are input into the RF classifier, and compared with other methods; the accuracy of the method proposed in this paper is the highest and the classification result is the most stable. In conclusion, compared with the existing fault detection methods, the proposed fault detection method for rotating machinery has higher accuracy, better robustness, and can adapt to a variety of mechanical fault states in practical engineering applications. It has high engineering application value [19–22].
Data Availability
The data are available in (1) PHM Data Challenge, (2009), available from https://www.phmsociety.org/competition/PHM/09; (2) Case Western Reserve University Bearing Data Center Website, available online http://csegroups.case.edu/bearingdatacenter/home (accessed on 15 October 2018).
Conflicts of Interest
The authors declare no conflicts of interest.
Authors’ Contributions
J.G. and X.Y. were in charge of algorithm and software; F.Z. and F.P. were concerned with experiment validation; J.G. and W.L. prepared the original draft; X.Y reviewed and edited the manuscript. All authors have read and approved the final manuscript.