Abstract

A novel multiteam competitive optimization (MTCO) algorithm has been proposed to diagnose the fault patterns of bearings. This algorithm is inspired by competitive behaviors of multiple teams. It is a three-level organization structure; thus, more potential optimal areas can be searched. By imitating human thinking, such as the betrayal and replacement behavior along with the introduction of an acceptable vector, new strategies within the MTCO are designed to increase the diversity and guide jumping out of location suboptimal areas. In addition to this, a kernel function has been introduced to reduce the recognition errors caused by data which are nonlinearly distributed in original space. The obtained experimental results demonstrate that the proposed MTCO is globally stable and optimal decision performance. After that the MTCO is applied for the fault diagnosis of bearings, and it has also been compared with other commonly used methods. The comparison indicates that the proposed algorithm has higher recognition accuracy.

1. Introduction

Fault diagnosis, which has been formed as an independent interdisciplinary during the 1960s, plays a significant role in modern industrial processes. It is the key to prevent serious accidents, ensure safety and obtain maximum economic benefits [1, 2]. With the development of industrial equipment, new data-driven methods including intelligence algorithms, data fusion, multivariate statistical analysis, signal processing and machine learning are paid much attention and get rapid development [3, 4]. A large number of researches on data-driven methods have been proposed to promote the development of fault diagnosis technology [5].

In recent decade, the researches on fault diagnosis mainly focus on data-driven methods. Yuan and Chu [6] investigated the discrete particle swarm optimization (PSO) algorithm to select the best fault features and optimize the performance of the support vector machine (SVM) classifier. There the experimental results proved that the developed method shows better performance than the SVM based on principal component analysis (PCA), or SVM based on genetic algorithm (GA) in the application of fault diagnosis of the turbo pump rotor. Zheng and Ma [7] introduced a method using the dual-mutation PSO to determine the optimal parameters for SVM. The experiments for damage type recognition of civil aeroengine indicated that the method could achieve higher diagnostic accuracy than that of the single or multiple kernels SVM where parameters were set randomly. Korürek and Doğan [8] introduced the data-driven method into the field of life sciences, and the classifier radial basis function (RBF) neural network whose structure was evolved by PSO, the optimized RBF network could classify electrocardiogram beats with a smaller size of network without making any concessions on the classification performance. Based on the above studies and other works presented by Lu et al. and Shang et al. [9, 10], a conclusion can be made that an appropriate classifier is essential for detection and identification of unknown fault patterns. Inspired by the heuristic random search process of swarm, this paper proposes a swarm intelligence algorithm called multiteam competitive optimization (MTCO) algorithm. This proposed algorithm is a data-driven method and mainly used to recognize unknown fault patterns. The essence of this algorithm is to search an optimal single-class center for each unknown fault pattern. Lots of researchers have focused on studying human decision methods by imitating human behaviors. Among them, some improved methods have achieved remarkable calculation effect, such as the work presented by Zheng [11]. Research on human learning behaviors had shown that the best planners adjusted their decisions considering the current state and their perception of the best experiences from others. Based on this idea, Tanweer et al. [12] designed two learning strategies for PSO incorporating the best human learning strategies for finding the optimum solution. Cheng and Jin [13] developed a social learning PSO (SL-PSO), which allowed the individuals to learn behaviors from other better particles in the current swarm. There the comparative results showed the SL-PSO performed well on low-dimensional problems and was promising for solving large-scale problems as well. Li et al. [14] proposed an information sharing mechanism (ISM) to make each particle share its best search information, through which all the particle could take advantage of the shared information. In this way, a competitive and cooperative operator inspired by human behaviors was designed to control the shared information being utilized in a proper and efficient way. The competitive and cooperative PSO with ISM (CCPSO-ISM) can prevent the premature convergence when solving global optimization problems. In addition, aging had been explored by biologists to be an important mechanism for maintaining diversity. In a human social, aging made the old leader of the colony become weak, which provides opportunities for the other individuals to challenge the leadership position. Inspired by this natural phenomenon, Zheng et al. [15] incorporated this mutation mechanism into PSO and PSO with a trying mutation strategy, to overcome the problem of premature convergence without significantly impairing the fast-converging feature of PSO.

In this paper, based on the traditional PSO, we constructed a new organization structure for the particles, in which the competition of multiple teams is introduced to recognize unknown fault patterns and then find the optimal decision. The particles are redefined as members and classified into three groups, that is, staff belonging to the majority, leaders belonging to the minority, and the only boss. The organization structure of the proposed algorithm is shown in Figure 1. These staff follow different leaders and the leaders are accountable towards the boss. Finally, the boss adjusts the decision again based on the optimal decision made by staff and leaders. So the method is termed as MTCO algorithm.

Aiming to search more potential optimal areas, the MTCO algorithm is conducive to get rid of the premature convergence effectively and overcome the influence of randomness on the optimal decision solution. Thus, the global optimal decision solution can be obtained with a higher probability. Therefore, the proposed algorithm can obviously improve the accuracy of unknown fault patterns via looking for the optimal class centers of different fault patterns.

The rest of the paper is organized as follows. In Section 2, MTCO algorithm has been proposed. Section 3 introduces the recognition process of the MTCO algorithm. After that the application problem that is the fault diagnosis of bearing has been discussed in Section 4. Finally, in the last section, conclusion has been drawn.

2. The MTCO Algorithm

2.1. The Optimization Principle

The MTCO algorithm is constructed on the basis of the traditional PSO. But the meanings of particles, updating strategy, and organization structure have been redesigned. The traditional PSO has to suffer from the premature convergence due to trap into local optimal areas, or the lack of diversity [16, 17]. Also, the traditional PSO is not very efficient when coping with complex multimodal functions because the whole particles only follow the single global extremum and cannot efficiently exchange search information. Inspired by the competition of multiple teams, this paper designs a new organization structure and corresponding updating strategy to solve the exiting problems mentioned above.

The fitness function existing in the traditional PSO is redefined as the decision function for the competition behaviors of multiple teams. And the fitness value is also redefined as the decision value. Assume that it is a maximization problem. As shown in Figure 1, the global extremum is redefined as the only boss, and the ordinary individuals are redesigned as leaders and staff , respectively. The number of leaders is , and the number of staff is . Leaders and staff have their own individual extremum and , respectively. Each team only has one leader. The staff in one team make decisions under the guidance of their team leaders. So the leaders belong to the minority with higher decision values, and the staff belong to the majority with relatively low decision values, usually, .

Based on this structure, the procedure of the MTCO is described as follows.

Step 1. Initialize each individual’s initial velocity and position randomly using a certain method. Each position is a potential optimal decision solution.

Step 2. Calculate individual’s decision value and sort these values in an ascending order according to their decision values. The individual with the highest decision value is the boss .

Step 3. Choose individuals as leaders from the top of ranking. The rest of individuals are the staff. Then the staff will be randomly assigned to different teams, and each team may have different number of staff according to their decision values. The higher the decision value is, the more staff the team has. The number of staff belonging to is calculated bywhere denotes the leaders’ ordinal in an ascending order. .

Step 4. Update these staff’s velocities and positions using equations (2) and (3), respectively.and can be calculated bywhere denotes the staff’s ordinal in an ascending order. denotes the current iteration number, is the maximum iteration number. and are constants, usually, , and . and are random control parameter sets in interval [0, 1]. represents the individual extremum of the leader which is the leader of the staff. and are newly designed as acceptable vector which is a binary. is the inertia weight, is the maximum value, and is the minimum value. The initial value of is . is a linear decreasing inertia weight adopted by these staff. It is because that these staff should gradually converge to their leaders to search for better decision values. So the linear decreasing inertia weight adopted here is reasonable.
In this study, is denoted as the cumulative experience which represents the gradual process from indistinct to clarity for cognizing the decision problem. is explained as the independent-amendment ability. It means that the staff amend their decisions depending on their own recognition. And in a team, the leader’s opinion usually affects the staff’s judgment. So is described as the dependent-amendment ability which is affected by the leader. In practical engineering, some uncertainty or random events around the staff may affect the decision-making. It will lead to a better or worse decision. So after amending a decision, the staff may accept some amending suggestions or reject it. The acceptable vector just imitates this process. Moreover, another function of acceptable vector is used to increase the diversity of staff and avoid the aggregation of staff. The acceptable vector is a binary vector as the same dimensions as staff, which can be calculated bywhere , is an element of . is a function for the purpose that if , ; otherwise . is a random threshold in interval [0, 1]. is a function used for generating a uniform distribution real vector with dimensions in interval [0, 1]. So if , the amendment for the dimension will be accepted. Otherwise, the amendment will be rejected.
Actually, the updating process of staff is consistent with the process of human decision-making. Meanwhile, the values of and can control the personality of the staff who tend to depend on their own amendment or leader’s guidance, or a compromise.
In this step, the betrayal mechanism is introduced to guide the staff leaving the current team leader when they cannot change their individual extremum. We design a two-dimensional betrayal vector called betrayal vector. and are lower and upper limit values, respectively, initialized randomly. When is lower than , . It means that if the staff cannot achieve a better decision solution, it will create discontent for leader, like human, the increasing of discontent is gradually cumulative. When , the staff may leave the current leader and enter another team with a certain probability, the probability of which denotes the staff belongs to the leader is as follows:Due to , the staff which intends to betray its current leader may go to other teams, or may still follow its current leader. After reselecting the new leader, then initialize the betrayal vector again.
If is higher than , ; if is higher than , , and if is higher than , . Using this updating strategy, the staff can make better decisions. Meanwhile, the communication with leaders and boss can be completed, which will provide better references for leaders and boss.

Step 5. Update these leaders’ velocities and positions using equations (7) and (8), respectively.Similarly, can be calculated byFor these leaders, the dependent-amendment ability in equation (7) is influenced by the boss compared with staff’s velocities updating formula. The meanings of other terms in equation (7) are described as same as equation (2). The function of acceptable vector in equation (7) is to reduce the over dependency on the boss and avoid trapping into local suboptimal decision areas. But is a linear increasing inertia weight. The reason is that these leaders can obtain higher decisions. They have a duty to explore more solution space for searching for the global optimal decision value. So and can balance the exploitation and exploration ability between staff and leaders, which increases the probability of obtaining the global optimal decision solution.
A leader cannot be a leader for ever. Its leader status depends on its decision-making ability. So a replacement mechanism is necessary for these leaders when they have relative lower decision values than some staff. At the iteration , if some leaders drop outside the top in the ranking, they will be relegated to staff, and given a new leader and new betrayal vector randomly, the staff with higher decision solutions will replace the leaders. Meanwhile, these staff belonging to the relegated leaders will reselect a new leader with the probability calculated by equation (6). The replacement mechanism can ensure the whole staff follow the excellent leaders all the time, which is conducive to make the global optimal decision solution.
If is higher than , , and if is higher than , .

Step 6. Update the boss’s position using and can be calculated using is also a linear decreasing inertia weight, where is initial value of , and is the final value of , usually . In the prophase of iteration, a larger is conducive to explore the lager solution space roughly. In the later stage of iteration, a smaller is conducive to exploit the more precise solution.
On the basis of global optimal decision solution made by all subordinates, the boss will amend the current global decision value with random free updating strategy. The decision-making for the future is still an uncertain process. Uncertainty factors also influence the amendment of boss, which leads to a higher or lower decision solution. So acceptable vector still works on equation (10). The elite strategy is used to update the boss’s position. The description for elite strategy is like that if is higher than , retain ; otherwise, retain .

Step 7. Stop the iteration if the terminal condition is satisfied; otherwise, go to Step 2.
Finally, the flow chart of the MTCO algorithm is illustrated in Figure 2.
The MTCO algorithm can increase the diversity of staff and make the leaders get rid of the over dependency on the global optimal decision solution effectively. The layered organization structure can make the staff follow more potential optimal decision solutions. Meanwhile, the boss of each iteration also takes part in making decision, which increases the exploitation ability for obtaining optimal decision solution. So in the following section, the MTCO algorithm would be verified for its global optimal decision ability.

2.2. Optimization Experiment and Comparison

Some well-known test functions are used for comparing the performances of different optimization algorithms. The commonly used test functions include the Schaffer function , the Griewank function , and the Rosenbrock function , which are given as follows:

The graphics of these functions in the three-dimensional coordinate are shown in Figure 3. is a complex two-dimension function having countless local extremum. This can acquire the global extremum at the location of , so it is difficult to search the global extremum. has lots of local extremum point. The global extremum is 0, which can be searched at the location of . The Griewank function is a typical nonlinear multimode function, which is always regarded as a more complex optimization problem. As for , its global extremum is located in a smooth and narrow area similar to valley formed by parabolas, and thus the information provided by is limited that many optimization algorithms have a problem on the identification of searching direction.

The self-regulating PSO (SR-PSO) proposed by Tanweer et al. [12], the seeker optimization algorithm (SOA) proposed by Dai [18], and the ALC-PSO proposed by Singh et al. [19] were used to obtain relative better optimization results by imitating some human decision-making behaviors. Therefore, the MTCO algorithm will compare with them, besides Standard PSO (SPSO). The population size for all algorithms is 60, and the total iteration number is set as 600. As for the MTCO algorithm, the number of leaders is set as 3, is set as 0.9, is set as 0.2, is set as 0.5, is set as 5e−5, the is set randomly in interval [80, 90], and the is set randomly in interval [95, 100], and are both integers. How to set the parameters for other algorithms mentioned have been described in previous studies [12, 18, 19], and the terminal conditions is . At the same time, random settings are applied on initializing the parameters including velocities and positions. To compare the global optimal decision ability, the algorithms are calculated one hundred times repeatedly, which can verify the influence of randomness. Min, Mean, Max, and STD (Standard Deviation) of fitness values can be used for performance verification. Table 1 shows the comparison results for Schaffer, Griewank, and Rosenbrock functions, respectively.

These comparisons demonstrate that the proposed MTCO algorithm has much stronger decision-making ability than other algorithms. Its stable and reliable performance ensures that the global optimal decision solution can be obtained with higher probability. Accordingly, the MTCO can be applied on recognizing unknown fault patterns for the fault diagnosis of critical equipment.

3. Recognition Process of the MTCO Algorithm

3.1. The Recognition Principle

The recognition principle for recognizing the unknown fault patterns is shown in Figure 4.

As shown in Figure 4, there are two categories of data with two dimensions representing different patterns. Firstly, a decision function is selected to determine an optimal class center for each category of data, and the class center need to meet three optimization conditions, that is, shorter intraclass distance, longer interclass distance, and higher recognition accuracy. Virtually, the intraclass distance is defined as the sum of distances from all known samples having the same class label to their class center. The interclass distance is defined as the sum of distances between all the class centers. And higher recognition accuracy means the proportion of the known samples classified into right categories. For an unknown sample, its recognition criterion still depends on the distance. That is to say, if the distance from an unknown sample to the center of Class 1 is the shortest, this unknown sample should be classified into this class. In this study, the Euclidean distance (ED) is used to express the distance between two samples, which can be calculated by and the criterion can be expressed as follows:where is a unknown sample, is the total number of classes, is the 2-norm, is the element of sample and is the element of sample .

As previously mentioned, the decision function should meet three conditions, and thus the proposed decision function is defined as where is a member, is the total number of train samples, is the number of train samples classified into the correct classes, is the number of the train samples belonging to the class, and is the class center vector. This is a maximization decision problem, and a larger value of decision function means the MTCO algorithm has stronger recognition ability for unknown samples.

3.2. Kernel Function in the MTCO Algorithm

A kernel function is paid more attention with the development of SVM. The kernel function must meet the Mercer condition and can combine the nonlinear mapping and inner-product of two vectors in feature space so that the nonlinear mapping is carried out implicitly. So the kernel function makes the mapping from the original space with nonlinear distribution to the high-dimension feature space with linear distribution, the latter of which can be realized easily and simply. RBF kernel and polynomial kernel are the most commonly used kernel functions [20], which are given as follows:

According to the Mercer condition, , where is the mapping function [21], the ED of two samples can also calculated in feature space via mapped by , which can remove the influence of nonlinear distribution of data on the recognizing unknown samples, the ED in feature space can be calculated by

Using equation (17), the decision function can be recalculated in feature space. After applying the kernel function, the data with linear distribution in feature space can be obviously conducive to recognition.

It is critical for mapping and decision-making, that how to choose an appropriate kernel function and determine corresponding kernel parameters. With the help of powerful optimization ability of the MTCO algorithm, it is easy to determine appropriate parameters for the kernel function. Kernel functions are generally categorized into two groups, including global kernel and local kernel. The former is more suitable for classification. The polynomial kernel is a typical local kernel, and the RBF kernel is a typical global kernel. Due to , the RBF kernel is chosen for the MTCO algorithm. Equation (17) is then rewritten as follows:on the basis of equation (18), the decision function is rewritten as follows:

After calculating the optimal decision value in the high-dimension feature space, the Standard MTCO (S-MTCO) algorithm is transformed into Kernel MTCO (K-MTCO) algorithm.

3.3. Recognition Experiment and Comparison

To verify the recognition performance of the K-MTCO algorithm, several classic and typical data sets are used, whose relative information is listed in Table 2. All data sets are collected from University of California Irvine (UCI) machine-learning repository, whereas the train number and test number are selected randomly. In this study, some most widely used fault diagnosis classifiers such as SVM [22], back propagation (BP) network [23], and learning vector quantization (LVQ) network [24] besides S-MTCO algorithm are used to compare with the K-MTCO algorithm. From this, one may have the necessity of kernel function.

The setting for K-MTCO and S-MTCO is similar as such the population size is 80; the total iteration number is 800; the number of leaders is 6; is 0.9; is 0.2; is 0.2; is 0.001; the is set randomly in the interval [85, 90]; and the is set randomly in interval [95, 100], the number of parameters optimized by the S-MTCO is , and optimized by K-MTCO is , where ‘1’ denotes the kernel parameter . The k-NN is sensitive to the parameter k, so we will verify the influence of difference values of k on the recognition. Besides k = 1 will be listed in the tables, we will also list the value of k that makes the recognition accuracy highest and lowest. As for the settings of the parameters and network structures of SVM, BP, and LVQ network, please refer to the references mentioned above. Especially, the error goals for BP and LVQ network are set as 0.001, and the iteration numbers for them are 2,000 and 500, respectively. All 4 data sets should be scaled to be in interval [0, 1], which can reduce the searching range and improve the algorithm’s efficiency.

Tables 3 to 6 show the comparison results using different algorithms. Similarly, random settings also exist in the BP network, the LVQ network, and SVM, and the calculation is repeated 3 times for verifying the influence of the randomness on the recognition accuracy.

Tables 3 to 6 list the ratio between the samples recognized correctly and the total samples of each class. For example, in the second line of the row of the K-MTCO in Table 3, the notation ‘(31/33)’ means that the second class has 33 samples, but only 31 samples are recognized correctly.

These results indicate that the MTCO algorithm is effectual and its accuracy is usually higher than other algorithms. The K-MTCO algorithm has the highest recognition performance. In the following section, the MTCO algorithm will be applied for the fault diagnosis of the bearing.

4. Application in Bearing Fault Diagnosis

Rolling element bearings may lead to fatal breakdowns and even unacceptably long-time maintenance stops if the faults occur [25] as the key components located between the stationary and rotating part of the motors. Therefore, it is significant to recognize unknown fault patterns accurately occurred in the bearings. In this section, some features are firstly extracted from vibration signals of bearing and then the MTCO algorithm is applied for the fault diagnosis of bearing.

4.1. Collection of Bearing Vibration Signals

The vibration signals are obtained from Case Western Reserve University (CWRU) bearing data center. As shown in Figure 5, the test stand consists of a 2 hp (horse power) motor, a torque transducer, and a dynamometer. The test bearings manufactured by Svenska Kullager Fabriken (SKF) support the motor shaft. Single-point faults were introduced to the test bearings using electro-discharge machining (EDM) with fault diameters of 0.007 inches and 0.021 inches. Vibration data are collected by accelerometers for normal bearings, single-point drive-end bearings defects which include outer race fault with defect size of 0.007 inches and 0.021 inches, inner race fault with defect size of 0.007 inches and 0.021 inches, and ball fault with defect size of 0.007 inches and 0.021 inches. Data are collected at 12,000 samples per second for drive-end bearing experiments. In this study, there are 7 original vibration signals covering the 7 drive end bearing conditions including 1 normal condition and 6 types’ faults. Each original signal is divided into 100 subsignals with 12,000 data points.

4.2. Feature Extraction

Four time-domain features such as mean, root mean square (RMS), clearance factor and kurtosis factor are chosen here. In fact, mean describes the static character of the signal; RMS reflects the signal amplitude characteristic, and it is not sensitive to the early faults but with a good stability; clearance and kurtosis factor are dimensionless statistical parameters, which are sensitive to the early faults, but have worse stability. Based on the above consideration, these four features in time domain is chosen. Their expressions are shown in the following equations:where is the total number of data points, is a time-domain signal, is the data point of signal , and is a function for calculating the maximum data point.

After applying the fast Fourier transform (FFT) to the signal , three features in frequency domain are calculated by the following equations:where is the total number of spectrum lines, is a frequency-domain signal, and is the data point of signal . describes the size of vibration energy in frequency domain, is deviation that reflect the dispersion or concentration of the spectrum, and also reflect the dispersion or concentration of the spectrum.

Table 7 shows the distribution of 7 different classes data. Each class data are divided in a proportion of 60% for training and 40% for testing randomly. Using the last six attributes, two three-dimensional data distribution of bearing conditions is shown in Figure 6. This indicates that the condition data have relatively better separability. Furthermore, Figure 6 also indicates that the feature extraction is helpful for recognizing these fault patterns. Before the recognition, an optimal single-class center is determined for each working condition.

4.3. Fault Diagnosis

Now, the MTCO algorithm is applied to recognize the bearing patterns. Substantially, the proposed algorithm can recognize multiclass fault patterns easily, whereas SVM is designed for recognizing the double-class problems [26]. So the MTCO algorithm can work well when coping with the multiclass fault patterns of bearing. In the subsection, the SVM [27], BP network [23], LVQ network [24], S-MTCO algorithm, and the swarm intelligence based on supervised PSO (S-PSO) classification algorithm [22] are compared with the K-MTCO algorithm, and the calculation is also repeated 3 times for the same purpose. The parameters for SVM are set according to ICDF (intercluster distance in the feature space) proposed by Zhang et al. [27], and the parameters for S-PSO classification algorithm are set according to the method proposed by Zheng and Gao [22]. The parameters for other algorithms are as same as that in Section 3.3.

The comparison results shown in Table 8 demonstrate that the MTCO algorithm, especially the K-MTCO algorithm, consistently has higher recognition accuracy than other algorithms. And the results calculated by the K-MTCO algorithm are quite a remarkable achievement from an engineering point of view. So this stable and reliable global optimal decision ability completely makes the proposed algorithm powerful for bearing fault diagnosis.

5. Conclusion

In this paper, a novel MTCO algorithm has been proposed. The optimization and recognition performance of the MTCO algorithm is evaluated using different typical characteristic test functions and data sets. The comparisons have been made with various commonly used intelligence algorithms based on human behaviors and pattern recognition. From the results, it can be seen that the MTCO algorithm performs consistently better than other human behavior-based algorithms, such as SRPSO, ALCPSO, SOA, SPSO. Moreover, it can also be concluded that the proposed algorithm has a stable and reliable global optimal decision-making ability. Finally, the MTCO algorithm is applied to identify multiple bearing faults. The recognition results indicate that the recognition accuracy calculated by the MTCO algorithm is better and higher than other commonly used algorithms including SVM, BP, and LVQ especially when the kernel function is introduced to the proposed algorithm.

Data Availability

The data used to support the findings of this study are included within the article.

Disclosure

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Conflicts of Interest

The authors declare no conflict of interest.

Acknowledgments

This research was supported by Sichuan Science and Technology Program, grant numbers 2019YJ0395 and 2021YJ0519, China Civil Aviation Administration Development Foundation Educational Talents Program (14002600100018J034), General Foundation of Civil Aviation Flight University of China (2019-053), Youth Foundation of Civil Aviation Flight University of China (Q2018-139).