Abstract

Symptom parameter is a popular method for bearing fault diagnosis, and it plays a crucial role in the process of building a diagnosis model. Many symptom parameters have been performed to extract signal fault features in time and frequency domains, and the improper selection of parameter will significantly influence the diagnosis result. For dealing with the problem, this paper proposes a novel dominant symptom parameters selection scheme for bearing fault diagnosis based on canonical discriminant analysis and false nearest neighbor using GA filtered signal. The original signal was filtered by a genetic algorithm (GA) at first and then mapped to the new characteristic subspace through the canonical discriminant analysis (CDA) algorithm. The map distance in the new characteristic subspace is calculated by the false nearest neighbor (FNN) method to interpret the dominance of symptom parameters. The dominant symptom parameters brought to the bearing diagnosis system can improve the diagnosis result. The effectiveness of the proposed method has been demonstrated by the diagnosis model and by comparison with other methods.

1. Introduction

Bearing is a common component in rotating machinery. Bearing failures can cause severe accidents and result in property loss and even personal injury [1, 2]. For that reason, it is significant to monitor and diagnose bearing faults. Previous studies indicate that bearing diagnosis methods mainly focus on data-driven methods [3, 4]. It employs the fault features to build a diagnosis model for diagnosing. The primary process of diagnosis model-based method is as follows: collect the vibration signal in a fault state and build a useful diagnosis model for fault diagnosis, utilizing signal filtering and feature extraction. It has been widely applied in bearing fault diagnosis [5]. In this way, fault symptom parameter selection is an essential operation, which will significantly influence the diagnosis result. There are many symptom parameters which are available to extract fault features in time and frequency domains. Among these symptom parameters, some symptom parameters are sensitive and dominant, and the others are insensitive. Using all parameters in the model directly may increase processing time and decrease the diagnosis accuracy. It is essential to choose the most representative symptom parameters before diagnosis.

Dominant symptom parameters selection involves much work. It aims to select the optimal feature subset to decrease both the time and complexity of the diagnosis process. Generally, based on the techniques in evaluating the cost function of feature selection approaches, the relevant selection algorithm can broadly be categorized into three groups [6, 7]: the filter, the wrapper, and the hybrid methods. Filter-based symptom parameter selection discovers the general characteristics of a dataset. It is independent of any classification model or clustering model [8, 9]. The filter-based method is less time consuming and of less efficiency. The wrapper-based method works on a specific model and selects the parameter subset aiming to optimize the model performance according to evaluation criteria [10, 11]. It is more time consuming and more effective than the filter models. The hybrid method tends to take advantage of filter methods for their computational cheapness and wrapper methods for their high accuracy [12, 13].

In the field of bearing fault diagnosis, many techniques were applied in symptom parameter selection under filter-based, wrapper-based, and hybrid methods. Yu et al. [14] proposed features selection by adjusting the rand index and standard deviation ratio (FSASR) with the K-means method and standard deviation (STD) to select the sensitive statistical characteristics of bearing fault signal. Meng et al. [15] applied the binary value of the gravitational search algorithm (BGSA) to find the essential features from the feature set. Sugumaran et al. [16] illustrated a method based on a decision tree to select bearing fault features. Li et al. [17] used a multiscale morphological filter signal processing method to select a train axle bearing fault feature. As the mentioned methods have some limitations in different working conditions, they need more feature methods to adapt to various harsh working conditions. This work focuses on a novel method for selecting dominant symptom parameters. Different from the previous methods, the proposed method imposes the FNN distance in CDA characteristic subspace to represent the performance of symptom parameters which can pick out the most representative symptom parameters in time and frequency domains for fault diagnosis. For avoiding the influences of signal noise, before the symptom parameters calculation, the measured signals go through automatic signal filtering by the GA method. According to the variance analysis theory, canonical discriminant analysis (CDA) can project data to the new characteristic subspace with the maximum between-group difference and the minimum intergroup difference. Canonical discriminant function can get the maximum canonical score to represent data information. False nearest neighbor (FNN) works in the new characteristic subspace, and with the embedded dimension increasing, the system may recover the high-dimensional space state gradually, while the false neighbor point with high similarity will be successively swept. An evaluation criterion is designed according to the similarity (FNN-based space distance) to select the dominant symptom parameters. In this paper, FNN can be used to choose the symptom parameters which show super performance in representing the fault information of roller bearing. The effectiveness of the proposed method was demonstrated by the support vector machine (SVM) diagnosis model and by comparison with other symptom parameters selection methods.

The rest of the paper is organized as follows: Section 2 will introduce GA filtering, symptom parameters of fault diagnosis, CDA and FNN dominant symptom parameters selection, and evaluation criterion, respectively. In Section 3, the experimental validation is carried to examine the performance of the proposed method. The conclusion is drawn in Section 4.

2. Dominant Symptom Parameters Selection Scheme of Bearing Fault Diagnosis

2.1. GA Filtering

Before the symptom parameters calculation, signal filtering should be performed to decrease the noise influences. The GA filtering has been published in our previous works [1]. As shown in Figure 1(a), in the signal frequency domain, the fault signal and noise are other existence, and the high-pass, low-pass, and bandpass filtering do not work in this situation. If it makes a binary string, each frequency is expressed by 0 or 1. When we find the optimal binary string, in which all of the noise-corresponding frequency positions set the binary string to 0, then the signal frequency spectrum multiplied by optimal binary string can obtain only fault components to be retained. The transform of the multiplied frequency domain signal to the time domain is a filtered process.

As an example, Figure 1 presents the principle of GA filtering. Figure 1(a) shows the original signal frequency domain with noise and fault signal. N represents noise, and S represents fault signal. Figure 1(b) shows the original signals of the time domain. When we perform the GA filtering to the signal, we can get the optimal binary string, as shown in Figure 1(c). The noise-corresponding binary string is set to 0. After the optimal binary string is multiplied with the original signal frequency spectrum, the filtered time-domain signal will be obtained as shown in Figure 1(d). Figure 1(d) shows the filtered signal and can be applied to the next operation.

2.2. Fault Diagnosis Symptom Parameters

There are many symptom parameters used for bearing fault diagnosis. They can carry information about faults. The sensitivity and dominance of parameters are different in different work conditions. They cover from the time domain to the frequency domain. Equations (1)∼(18) show different symptom parameters [18].

Time domain is as follows:

Here, is a signal series for , where is the length of the signal, and is the spectrum of for , where is the spectrum line. is the frequency amplitude.

The amplitude of has a significant influence on the value of such symptom parameters. As a result, each of the symptom parameters has a different order of magnitude. So, before calculating the symptom parameters, both of the spectrum and symptom parameters should be normalized:where is the spectrum of normalization. Also,where is the calculated fault symptom parameter and is the fault symptom parameter of normalization. is the mean value of such symptom parameters under the normal state, and is the standard deviation symptom parameter under the normal state.

2.3. CDA and FNN Dominant Symptom Parameters Selection

Canonical discriminant analysis (CDA) is a dimension reduction method developed from the principal component analysis (PCA) method [19] and canonical correlation analysis (CCA) method [20]. Given the label values and quantitative variables, it is possible to extract the canonical components that are linear combinations of the quantitative variables. These canonical components can be applied to evaluate the unlabelled observations. At the same time, the resulting scores are applied for discrimination [21].

Assume symptom parameters as follows:

The individual parameter mean vectors are , whereand overall mean centres of all symptom parameters are

The parameter variance-covariance matrix iswhere is illustrated by

is expressed by

It aims to make the between-class variance as large as possible and, at the same time, the within-class variance as small as possible; it chooses to maximize the objective function and is expressed by

For this reason, differentiating on and letting it be 0, we calculatewhere are the eigenvectors relating to the eigenvalues of and maximum . If the eigenvalues rank by descending order, such as , then may be treated as the corresponding coefficients of the canonical variables, respectively. The first canonical score can be shown by , and it interprets that the single best linear discriminator of these data. The second canonical score can be expressed by and is selected in the same way but subject to additional restriction; it is uncorrelated with . It will be interpreted as it is the second-best linear discriminator of these data.

False nearest neighbor (FNN) algorithm is employed to select dominant symptom parameters. It can pick out the most representative symptom parameters in time and frequency domains. The FNN is an algorithm proposed by Kennel et al. in 1992 [22] for calculating the embedding dimension in the chaos theory. Here, the FNN is applied to calculate the distance of different fault symptom parameters vectors. When the symptom parameters set is selected, the space dimension will be changed and the high similarity false neighbors will be removed in the phase space reconstruction process.

In space reconstruction, each of the space vectors iswhere is the symptom parameter, is the dimension, and is the delay time. has a nearest neighbor ( is the point other than ) at a certain distance. The distance between and is

When the dimension changes from to , the distance between two vectors will change from to :

When is more significant than , it can be regarded that it is the false nearest neighbor in and dimensions. Otherwise, it is the nearest neighbor. The nearest neighbor means that the fault symptom parameter is a strong correlation, so it should remove the nearest neighbor symptom parameters. Usually, the distance is used to measure the false nearest neighbor.

With the FNN algorithm, assume that a high symptom parameters matrix ( is the sample number and is the number of symptom parameters) is obtained. The FNN-CDA dominant symptom parameters method employs the CDA algorithm to obtain the canonical coefficient from characteristic subspace so that it can build the canonical discriminant functions. Based on these functions, it can calculate the canonical score of symptom parameters . Then, one symptom parameter can be successively set from the original parameter set to zero, thus obtaining a new parameter set . After the CDA operation, can get the canonical score . The false nearest neighbor algorithm is used to calculate the distance between and .

According to the distances in the descending order, the small distance means less contribution to fault diagnosis, and therefore, it is a nondominant symptom parameter.

2.4. Evaluation Criterion

For determining the dominant symptom parameters, an evaluation criterion is proposed in this paper. It is defined as follows.

Assume that is the value of FNN distance and is the number of symptom parameters. is ranked as . Individual percentage is defined as

The cumulative percent of the first types in the sequence is defined as

When the threshold is set to , the ideal result will be available. If the threshold is less than or more substantial than , the diagnosis accuracy will decrease. If , it is deemed that the dominant symptom parameters include the first symptom parameter. The threshold refers to principal component analysis theory: the percentage of cumulative sum over 80% can represent the primary information of the signal.

2.5. Framework of the Proposed Method

According to the previous introductions of the method, the schematic of the dominant symptom parameters selection bearing fault diagnosis method is shown in Figure 2:

The process of the proposed method is summarized as follows:(1)The signal is acquired from the condition diagnosis test experimental bench under various fault conditions (normal condition and fault conditions).(2)The acquired original vibration signals were filtered by the GA filtering method.(3)The filtered signals were employed to calculate the symptom value of all time- and frequency-domain symptom parameters. They cover the fault information in time and frequency domains.(4)CDA is performed to calculate the canonical discriminant function and obtain the canonical score.(5)Successively set the single symptom parameter from all parameters to zero and recalculate the canonical score.(6)Combining the canonical scores (vector) and the canonical scores of all parameters to calculate the space distance by FNN in characteristic subspace. It can evaluate the performance of symptom parameters.(7)Based on the distances in descending order, according to the evaluation criteria, select the dominant symptom parameters.(8)The SVM algorithm is performed to build the discriminative model for verifying the effectiveness.

The schematic of the proposed method is shown in Figure 2.

3. Experiments and Discussion

In this section, the results of the performed experiments are presented. The experiment in each section represents a further step to achieve the objectives described in the previous section.

3.1. Experimental Conditions

The effectiveness of the presented strategy will be validated by the analysis of three rotating machinery fault signals of the outer race fault (O), inner race fault (I), roller element fault (E), and normal signal (N).

Figure 3 shows the experimental bench for the condition diagnosis test. It consists of a loading, motor, and rotor system. The ball bearing (type: NSK 205) is applied for bearing condition diagnosis. As shown in Figure 4, the fault depth is 0.05 mm and width is 0.3 mm.

In this work, each state’s original vibration signals were measured by using an accelerometer with a 50,000 Hz sampling frequency. The accelerometer is fixed in the vertical direction of the bearing. The speed of the servo motor was 1500 rpm.

3.2. Bearing Fault Diagnosis Case
3.2.1. GA-Based Filtering

The GA-based filtering requires designing a fitness function to evaluate the similarity between extracted diagnosis state noise signals and normal noise signals.

For this reason, a “statistical information evaluation function” is created in this work. The probability density distribution function (PDDF) is illustrated as follows:

And the frequency density distribution function (FDDF) can be shown as follows:

The statistical information evaluation function can combine the PDDF with FDDF, and it is expressed aswhere is the probability density value at t; is the PDDF of the extracted noise signals in a diagnosis signal; is the PDDF of noise signals at a normal state; is the FDDF of the extracted noise signals at diagnosis state; is the FDDF of noise signal at a normal state; and j is from 1 to n. The detailed settings can be seen in our past research achievement [1].

Parts of GA filtered signal under different bearing fault signal can be seen in Figure 5.

From the raw signal, the features are submerged in the strong background noises. After the GA-based filtering, the fault-related information can be observed in the GA-filtered signal clearly; it works as condition surveillance, and the precise fault diagnosis is shown in the later sections.

3.2.2. Symptom Parameters Extraction

The data length is 360448, and the data segments into 88. According to the sampling frequency and RPM, the subsegments contain more than one rotation signal information. When the data length of each subsegment is less than one rotation signal information, the classification accuracy will be lower. The chosen data employ the symptom parameters in equations (1)∼(18) to calculate the symptom parameters values. As the data characteristic, symptom parameters values normalize the data in the range from -1 to 1 by the normalization method. For illustration, the result shown in the paper is the mean of the absolute values of symptom parameters. The values are shown in Figure 6.

As observed in Figure 6, under different fault conditions (normal and fault conditions), the symptom parameters values are different. Every symptom parameter of four conditions is shown in subplots SP1∼SP18. Every subplot is different. In Subplots 17 and 18, the symptom parameter value of normal signal compared with other conditions is the biggest. In Subplots 1, 5, 8, 9, 15, and 16, inner race fault is compared with normal, outer race fault, and roller fault, the symptom parameter being the biggest. In Subplots 10 and 11, the symptom parameter value of the outer race fault compared with other conditions is bigger. In Subplots 2, 3, 6, and 14, the symptom parameter of the roller fault compared with other conditions is the biggest. Some histograms of symptom parameters look the same (e.g., Subplots 10 and 12), but the analytical ability will be quite different.

3.2.3. Canonical Discriminant Analysis

Normal condition symptom parameters were combined with the inner race fault, outer race fault, and roller fault symptom parameters to build a data set and further to be used for CDA operation. The normal condition symptom parameters are labelled as one, inner race fault symptom parameters are labelled as two, outer race fault symptom parameters are labelled as three, and roller fault symptom parameters are labelled as four. The CDA gets the canonical function, which allows the symptom parameters and the type to establish the primary and secondary canonical (e.g., two canonical functions) correlation. Here, each bearing condition with 88 symptom parameters obtained from equations (1)∼(18) was used for the CDA operation to calculate the canonical function. The primary component proportion is 71.64%, and the secondary component proportion is 24.26%.

Based on the canonical function, the symptom parameter canonical score can be calculated. The scatter diagram plotted with the canonical score can be calculated by the canonical discriminant function, as shown in Figure 7.

The result shown in Figure 7 is the canonical score of four bearing conditions. From the result, it can be seen that all of the fault conditions (inner race, outer race, and roller) are separable from the normal condition. At the same time, the canonical scores under the fault condition can also be separated from each other. The symptom parameter with canonical score distribution can have a maximum between-group difference and the minimum intergroup difference. According to the canonical score, it can get the fault canonical score vector .

Successively, set one symptom parameter from all parameters to zero and calculate the new canonical score .

The FNN is employed to select the dominant symptom parameters based on the canonical score vector , and is shown in the later section.

3.2.4. False Nearest Neighbor Analysis

The false nearest neighbor analysis was employed to calculate the distance from all symptom parameters sets and new symptom parameters (set one symptom parameter to zero). The and are brought to the algorithm to get the FNN space distance. The FNN space distance values can be seen in Table 1.

The result shown from Table 1 can be found when one of the symptom parameters is set to zero, and the distance will be sharply changed. Mapped to the dominance, it suggests that each symptom parameter may have different impacts. For Symptom Parameter 14, the distance is maximum, which indicates that it is the most sensitive. Similarly, Symptom Parameter 15 has a minimum correlation, being more insensitive compared to other symptom parameters. From the symptom parameters from one to eighteen, some of the time-domain symptom parameters have a big distance, such as Symptom Parameter 4. Some frequency-domain symptom parameters have a big distance such as Symptom Parameter 13.

3.2.5. Evaluation Criterion to Select the Dominant Symptom Parameters

For selecting the dominant symptom parameter, the evaluation criterion requires to calculate the percentage of each symptom parameter. At the same time, all the percentages are ranked in descending order to calculate the cumulative percentage. The raw percentages are shown in Figure 8, and the ranking table and cumulative value are shown in Table 2.

Figure 8 and Table 2 show the maximum and minimum distances intuitively. Table 2 shows the ranking of percentages of symptom parameters, wherein the cumulative percentage is shown in the last column. According to the evaluation criterion, when the threshold is set to 80%, it can get the optimal result. The influence of the threshold will be described in the later section. So according to the table, Symptom Parameters 14, 13, 4, 11, 1, 7, 8, 12, 18, 9, 10, and 17 are dominant symptom parameters. Other Symptom Parameters 6, 3, 5, 16, 2, and 15 are labelled as insensitive parameters and should be removed from fault diagnosis data sets.

3.2.6. SVM Fault Classification Analysis

For analyzing and verifying the performance of the selected parameters, this work used the support vector machine to build a classification model for fault diagnosis. The model used RBF as the kernel function, and the kernel parameter and the penalty parameter of the error classification are referred in [23]. 70 of each kind of symptom parameters from the normal signal, inner race fault, outer race fault, and roller fault were used to train the model. The other 18 symptom parameters of the same kind were used for diagnosis and verification.

The results of all symptom parameters and dominant symptom parameter set from the trained SVM diagnosis classification model are shown in Figures 9 and 10.

Compared with all symptom parameters settings and the dominant symptom parameters set, it can be found that the dominant symptom parameters are brought in the model; the accuracy of the SVM model output is superior to all symptom parameters. It indicates that the dominant symptom parameters selection can reduce the data complication and lead to better fault diagnosis results. It verifies the validity and feasibility of the algorithm.

In order to prove that the dominant parameter has the best fault diagnosis performance, here, it presents some different sets of new symptom parameters subset to test the accuracy. The result is shown in Table 3.

According to the data in Table 3, the selected dominant parameters have the best preference for diagnostic classification: the accuracy is 97.23%. When the model uses all symptom parameters, the accuracy will decline to 93.06%. The nondominant parameters have the poor ability of fault diagnosis: the accuracy is 51.39%. The first several dominant parameters result (such as numbers 6 and 7) is worse than dominant parameters. It is indicated that the selected dominant symptom parameters have super characteristics for fault diagnosis and the threshold value is appropriate.

3.3. Comparison with High-Pass Filtering

For verifying the advantage of GA filtering, this section employs high-pass filtering to test the proposed method. According to the artificial experience, the pass frequency of high-pass filtering is larger than 1500 Hz. The other operations are the same as in the case.

As the same operation given in the cases, the classified result of the different symptom parameters is shown in Table 4.

From Table 4, it can be found that the selected dominant symptom parameters have super characteristics for fault diagnosis than all symptom parameters and all classification results of high-pass filtering are worse than the GA-based filtering results. It shows the effectiveness of GA-based dominant symptom parameters.

3.4. Influence of Evaluation Criterion Threshold

For discussing the influence of the evaluation criterion threshold , this section uses the result of bearing fault diagnosis cases to analyze the influence. Table 5 displays the SVM classification result with dominant parameters under the different thresholds.

In Table 5, when the threshold is set to 80 or 85, the classification result is optimal. When the threshold value is larger than or less than that, the classification result will less than 97.23. So, when the threshold value set to 80% can get the optimal result, the setting of in experiment is 80.

3.5. Compariosn with Other Features Selection Methods

To further test the effectiveness of the presented strategy, the experiments have been performed to compare with nonnegative matrix factorization (NMF) [13], PCA approach [24], and backward stepwise selection (BSS) and forward stepwise selection (FSS). The experiment data are the data of bearing fault diagnosis case given in Section 3.2 of the experiment. The SVM model will verify the subset of the selected parameter. The results are reported in Table 6.

The results reported in Table 6 shows that the proposed method achieves a better result than NMF and PCA methods. It demonstrated the effectiveness of the dominant symptom parameters scheme.

3.6. Comparison with Other Classification Methods

To further verify the effectiveness of the presented strategy, the experiments have been performed to compare with backpropagation neural network (BPNN). The experiment data are the data of bearing fault diagnosis case given in Section 3.2 of the experiment. The SVM and BPNN models will verify the subset of the selected parameter. The results are reported in Table 7.

The results reported in Table 7 shows that the selected dominant parameters have an influence on classification results.

4. Conclusions

This work proposes a novel dominant symptom parameters selection scheme for bearing fault diagnosis. It is composed of GA-based filtering, time- and frequency- domain symptom parameters calculation, CDA, FNN, and SVM model. The raw signal is filtered through the GA-based method at first. Then, the canonical components are extracted by CDA. FNN was performed to calculate the distance in the CDA characteristic subspace. According to the evaluation criterion, the dominant symptom parameters can be selected. The proposed method is helpful to omit the in-depth knowledge of object properties and is also effective in eliminating parameter multicollinearity. FNN selects the most representative symptom parameters for various types of fault.

The method has been verified through roller bearing fault signals. The results show that the accuracy of dominant symptom parameters is 97.23%; it is better than 93.06% of all symptom parameters. It is an effective method for bearing fault diagnosis.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Scientific and Technological Research Program of Chongqing Municipal Education Commission, KJZD-K201801502.