Abstract

The vibration signal is easily interfered by noise due to the influence of environment and other factors, which can lead to the poor adaptability, low accuracy of remaining useful life (RUL) prediction, and other problems. To solve this problem, this paper proposes a novel RUL prediction method, which is based on multiscale stacking deep residual shrinkage network (MSDRSN). MSDRSN combines the ability of stacking in improving prediction accuracy and the advantages of deep residual shrinkage network (DRSN) in denoising. First, cumulative sum (CUSUM) from statistics is used to divide the full life cycle of the rolling bearings and discover the points of failure. Second, stacking is used for feature learning on the raw data, multiple convolutional kernels of different scales are selected as base-learners, and fully connected neural networks are selected as meta-learners for feature fusion and learning. Then, DRSN is used to do prediction, and the acquired results are fitted with Savitzky–Golay (SG) smoothing. Finally, the effectiveness of the proposed method is proved by the IEEE PHM 2012 data challenge dataset. Compared with the multiscale convolutional neural network with fully connected layer (MSCNN-FC) and the bidirectional long short-term memory (BiLSTM) for RUL prediction under the noise. Using the proposed method, the mean absolute error (MSE) of the best result is 0.002 and the mean square error (MSE) is 0.014; meanwhile, the coefficient of determination (R2) of the best prediction result can reach 97.6%. It is also compared with other machine learning methods, and all the results prove the accuracy and effectiveness of the proposed method for RUL prediction applications.

1. Introduction

In the field of prognostics and health management (PHM), the accurate prediction of remaining useful life has always been a key and extremely challenging problem [1, 2]. Rolling bearing is one of the most common and the crucial parts of modern machinery, and almost all kinds of mechanical equipment needs it [3]. However, long-running and repetitive loads can cause wear and damage to bearings, which can lead to noise. Rolling bearings in normal operating conditions typically produce low levels of noise, but once the bearing is damaged or worn, the noise level increases significantly. Noise, which is a common indicator of rolling bearing failure, can provide engineers with critical information about the extent of bearing wear and damage [4]. Once a rolling bearing fails, it can affect the operation of the entire mechanical system and can cause serious consequences for the equipment and the enterprise [5].

Therefore, in order to avoid the premature failure of rolling bearings, the equipment will cause accidents and even cause a huge economic loss to the company. We need to make an accurate prediction of the remaining useful life of the rolling bearing so that we can find the failure in advance and replace it and solve the problem in time. It can not only accurately reflect its health condition in operation but also provide an accurate theoretical basis for the development of the health management plan for the equipment [6].

In general, RUL prediction methods can be divided into three categories: methods based on physical model, methods based on data-driven models, and hybrid methods [719]. The method based on model-driven uses physical and mathematical models for modeling and then estimates the parameters of the model from the monitoring data to predict the degradation trend of rolling bearings over their full life cycle. For example, Ding et al. [20] proposed a method to extract time-domain features such as RMS and kurtosis from the vibration signals of rolling bearings, which were used to evaluate them by the proportional hazard model, and achieved a better result. Wang et al. [21] proposed a method to get the covariates in the Weibull proportional hazard model by KPCA, which achieved high accuracy in RUL prediction. However, this often requires numerous kinds of prior knowledge, resulting in difficulty in accurately establishing a degradation model under the complex system structure and working condition [22].

With the rapid development of sensor technology, computer science, and artificial intelligence theory, data-driven methods have become more and more attractive in prognostics and health management (PHM). The data-driven method uses machine learning and deep learning to learn autonomously from the data and then infer the degradation process of rolling bearings; it not only can save time and labor but also can accurately predict RUL [23, 24]. Zhang et al. [25] proposed an improved CNN to predict RUL by using CNN’s ability of autonomous learning. Xin and Weitang [26] proposed a bearing remaining life prediction method based on multiscale convolutional neural network.

Although CNN-based deep learning algorithms have achieved a large number of excellent results in the field of bearing RUL prediction, most of these methods have accomplished performance validation on laboratory datasets. However, it is difficult to capture bearing vibration signals with high signal-to-noise ratio in industrial production because of various noise sources in the environment, which makes many RUL algorithms in the industrial field have serious problems of accuracy decline or even failure. Therefore, in order to realize the practical industrial production requirements and enable the methods to complete the life prediction task in noisy environments, many researchers have focused their research on improving the robustness of the model. They have studied extensively on fault diagnosis in noisy environments. For example, Zhicheng et al. [27] used empirical wavelet transform to reconstruct the signals, then used minimum entropy deconvolution CNN to reduce the noise of the signals after composition, and achieved better results. Zou et al. [28] automatically extracted features from the background noise with a 1DCNN by performing structure optimization. Su et al. [29] designed a class of hierarchical branching CNN structures and built a basic convolutional block with strong robustness by stacking one-dimensional small convolutional kernels, which improved the accuracy. Although there are a large number of research results on fault diagnosis in noisy environments by researchers, there are still relatively few reports on life prediction in noisy environments.

Inaccurate prediction results are obtained because the CNN-based prediction model usually has the problems of gradient disappearance and gradient explosion. Therefore, He et al. [30] proposed a residual network, which brought the shortcut connection into the network to improve the linear transformation ability, and it avoids the problems of gradient explosion and gradient disappearance, thus realizing the network to stack to a deeper layer. The proposal of residual network made it possible to have a deeper network, and then appeared the deep residual network, which can reduce the number of model parameters and shorten the time of model training by increasing the setting of residual connections. Yu et al. [31] proposed a ResNet model constructed by extracting features of 1D vibration signals.

When some data containing noise and complex data are used for feature learning, the results are often not very satisfactory. Zhao et al. [32] proposed DRSN, which is an improved network of DRN, brought soft-thresholding into DRSN as a nonlinear transformation layer, and achieved autonomous learning of thresholds by adding an attention mechanism, and thus it can extract degrading information effectively.

Although the hazard methods can make full use of the advantages of both methods, the process is more complicated; therefore, this kind of method has been rarely reported. Aiming at the problem of noisy signal data in rolling bearing remaining useful life prediction, this paper focuses on the ability of deep residual systolic network to handle noise-containing data and improve it.

The novelty in this paper can be summarized as follows:(1)The deep residual shrinkage network is improved by bringing the idea of stacking integrated learning, which learns more features of the dataset by two layers of learners, so that the results are less partial and the prediction ability of the model is improved.(2)Traditional DRSN models usually have a single scale of convolution. This paper proposes to use multiple scales of convolution kernels as the base-learner of DRSN, which can learn more features.(3)Based on the above, this paper proposes a model based on the MSDRSN prediction method, and the experiment results show that the method can be accurate in RUL prediction under noisy data and complex data.

The remainder of the paper is organized as follows. In Related Theory, the related techniques are introduced, and the basics of the methods used in this paper are presented, such as the basic deep residual shrinkage network, stacking integrated learning, and SG smoothing algorithm. By combining these methods, the MSDRSN method is constructed. In Materials and Methods, the detailed prediction process of MSDRSN is given, which includes the data preprocessing stage and the training process of the model. In Experimental Verification, the dataset used in this paper is introduced, the parameters of the experiments are determined, and the results obtained are discussed. Finally, this paper concludes the research.

2.1. DRSN

DRSN is a modified model of deep residual network, which consists of three parts: deep residual network, attention mechanism, and soft thresholding function, and is usually used to enhance deep neural networks so that more useful features can be extracted from noise-laden data and redundant information is eliminated better [33].

Compared with traditional convolutional neural networks, there are two major problems: first, when the network layers are deeper, it will be prone to gradient disappearance and explosion; second, the sample data are less adaptable under the strong noise environment. By using DRSN, it can improve the above two problems [34]. The main structure of DRSN is shown in Figure 1.

Residual building unit (RBU) is the basic component of ResNet. In Figure 1, where rectangle represents the feature map, C is channel, W is width, and 1 is height. A deep residual shrinkage network can contain two Batch Normalizations (BN), two Rectifier Linear Unit activation functions (ReLU), two Convolutional layers (Conv), a Identity shortcut, a Global Average Pooling (GAP) and some Fully Connected Layers (FC).

2.2. Stacking

Stacking is the process of stacking multiple models on the same layer to get the final prediction result. In the stacking method, there are two stages of models. The first stage model is a model based on the original training set, which is called the base-learner (level-0 model), and can choose multiple base models for training. The second stage model is the model in which the predictions of the base model on the original training set are used as the training set, and the predictions of the base model on the original test set are used as the test set, called the meta-learner (level-1 model) [35].

Stacking is attracting attention because the base-learner can get the prediction value with difference and more accuracy for the original data and then relearn it by using the new feature number, so the models of multiple base-learner in the integrating learning learn from each other to make the prediction result more accurate [35].

The stacking framework is shown in Figure 2.

2.3. Savitzky–Golay Algorithm

SG algorithm, published early by Savitzky and Golay in 1964, is widely used for data smoothing and denoising. It is an important method in the field of signal processing [36]. In the field of rolling bearing remaining useful life prediction, the test set will output a RUL prediction curve by the model. However, because the training of the model is difficult to achieve complete accuracy, some points in the predicted RUL curve may be predicted incorrectly, with sudden increases or decreases. Here, after smoothing the curve, these points can be modified with a local average trend, thereby having the effect of enhancing the prediction accuracy.

The signal after smoothing by the SG filter is shown in Figure 3.

The blue line is the original signal, and the red line is the signal after smoothing by the SG filter.

3. Materials and Methods

For the RUL prediction problem, this paper uses CUSUM to find degradation points. In addition, to improve model performance in learning degenerate features, this paper proposes a multiscale deep residual shrinkage network which combines the idea of stacking integrated learning. Applying integrated learning to DRSN enhances the ability to extract useful information. Using the residual connections of the residual network not only solves the gradient vanishing problem of the deep network but also improves the accuracy of the prediction results.

The complete RUL prediction process is shown in Figure 4.

3.1. CUSUM

CUSUM is a sequential analysis method that was first proposed by E. S. Page of Cambridge University, UK, in 1954. Its basic idea is to accumulate the small offset between the sample data and the target data during the process by accumulating the sample data, which plays the role of amplifying and enhancing the sensitivity of the small offset in the detection process, so as to detect the anomaly of the data, which is called CUSUM variable point detection [37, 38]. When applied to the bearing vibration signal, it can be used to find the variable point, which is the fault point of the rolling bearing.

The RUL of a rolling bearing is considered as a monotonically decreasing line. The health starts at 1. From the start of sampling until degradation decreases uniformly, it continues until the rolling bearing does not work, and at that point the health is 0.

In this paper, the marking method is different from the previous method. This paper divides the life cycle of rolling bearing into stable stage and declining stage. Generally, the rolling bearing works stably during the stable stage and the signal features basically do not change, but when it enters the declining stage, the signal features will change drastically until the rolling bearing breaks down. The advantage of this method is to keep the feature labels unchanged in the stable stage, and the labels start to decline gradually from the declining stage. It avoids the influence of the stable stage on the model training and improves the accuracy of the training results. By dividing the labels of the life cycle, the fault point of a rolling bearing can be found quickly, and the prediction result of RUL can be improved effectively.

The CUSUM algorithm is calculated in three steps, which are shown as follows.Step 1: calculate the mean value of the series.Step 2: calculate the cumulative sum.Step 3: take the maximum value in the cumulative sum, and the corresponding horizontal coordinate is the fault point.This layer accepts the raw rolling bearing vibration signal and takes the absolute mean value of the vibration signal and then it is used for life division and labeling by CUSUM.

3.2. MSDRSN

The network structure of DRSN mainly consists of convolutional layer, residual shrinkage module, and pooling layer. MSDRSN is a new model that combines the DRSN model into stacking integrated learning and by using different scale DRSN as the base-learner and fully connected networks as the meta-learner. In RUL prediction, three residual shrinkage layers are stacked together as part of the model structure. Multiple residual shrinkage layers can better investigate the mapping connection between input and output and solve the problem of gradient disappearance. However, the full life data sequence of the bearings will contain some noise, and the residual network can only reduce the noise to a certain extent; its training effect is ordinary.

Therefore, this paper proposes an MSDRSN by considering the ability of stacking to learn features. It can not only solve the problem of gradient disappearance due to network in deepening but also enhance the training ability of the model and improve the accuracy of training, which are all contributions to rolling bearing RUL prediction.

The DRSN model construction process is shown as follows.Step 1: convolutional layer—feature preextraction.The calculation principle of the one-dimensional convolution layer is as follows:where xi is the input, is the weight of the kth convolution kernel, ⊗ is the dot product operation, bk is the bias of the kth convolution kernel, fcov is the activation function, and is the output vector of the kth convolution kernel. At this layer, the vibration signal is firstly processed by using a one-dimensional convolutional kernel to extract the shallow features of the signal, in order to provide a basis for the deep feature extraction in the next step and, at the same time, set the padding in this layer to avoid the loss of boundary features. This layer completes feature preextraction.Step 2: residual shrinkage module—feature extraction and denoising.Given that the input is xi and the output is xj, the soft thresholding formula can be shown:where xt represents the output feature and t is the soft threshold. Finally, the sum of the soft thresholded feature map and the input feature map is the output feature map, and this step is the core of the identity function in the residual structure. In this layer, the network automatically extracts the features of the previous layer’s data and automatically learns the threshold for denoising. After that, the data are transmitted to the next layer of the network.Step 3: adaptive pooling layer—dimensionality reduction.where W is the width of the pooling domain, is the tth neuron of the ith feature vector in layer l, and is the value corresponding to the neuron in the l + 1 layer. The network at this layer accepts the information output from the previous layer, automatically downsamples the data according to the set output size, and outputs the data to the next layer.Step 4: splicing and fully connected layers—feature aggregation and relearning.where is the weight of the fully connected layer, x is the input, and b is the bias. In this layer, it is similar to the meta-learner in the stacking. After the features are spliced in this layer, they are sent to the fully connected layer for feature training as the sample information, and finally the results are output.Step 5: loss function.

We choose the loss function mean square error (MSE), which is most commonly used to solve the regression problem; the loss function is as follows:

MSDRSN plays the role of feature extraction and initial output of RUL prediction results. The model accepts the data from the preprocessing of the previous layer, then does feature self-extraction and output of lifetime, and puts the output results into the next layer for curve smoothing.

3.3. SG Filter

Because of the noise that inevitably appears in the training results, the SG filter was used to reduce the noise and smooth the curve to increase the accuracy and curve fit of the prediction results.

The SG filter is calculated in two steps, which are shown as follows.Step 1: Let x[i] (i = −m, …, 0, …, m) be a set of consecutive integer values in the moving window; then, the width of the moving window is M = 2m + 1. Using nth degree polynomials (n ≤ M) within the filter window, the data are fitted locally using least squares.where y(x) are the output data after fitting; x are the data to be fitted; and a is the parameter to be solved. The curve is best fitted when the residuals of the nth degree polynomial are zero.Step 2: the procedure for finding the residual for the rth point is as follows:where r = 0, 1, …, n, m is the number of single-sided points to be fitted, and x(i) are the data to be fitted; then, we have , carrying this coefficient into equations. The fitted polynomial is used to find the center point estimate within that window. The estimated value of the center point of the window is obtained by constantly moving the window and repeating the operation to obtain the estimated value of the center point of any window. Using SG filter to process the output data of MSDRSN, the noisy data can be effectively filtered out, so that the output prediction value can be more closely fitted to the real value, estimation accuracy can be improved, and model robustness can be enhanced.

3.4. Model Algorithms

The algorithm for the construction of the stacking-based prediction network process is shown in Algorithm 1.

Input: Training set
Output: Predicted value after integration H
Process:
Step 1: Data preprocessing
for i = 1 to m do
abs (mean())
end for
 = do CUSUM on
for i = 1 to m do
if i <  set  = 0
else set  = 
end for
normalization()
Step 2: Training base-learner
for t = 1 to T do
learn based on D
end for
Step 3: Feature aggregation
for i = 1 to m do
, where
end for
Step 4: Training meta-learner
learn H based on
Step 5: Smoothing the curve
smooth H
return H

4. Experimental Verification

To demonstrate the validity of the proposed model, this paper uses the rolling bearing accelerated degradation dataset published in the IEEE 2012 PHM Data Challenge [39]. The proposed RUL prediction method’s feasibility and effectiveness are analyzed in detail by experiments and compared with other RUL prediction methods.

4.1. Data Description

The PHM 2012 challenge dataset was provided by the PRONOSTIA platform of the FEMTO-ST Institute. It is shown in Figure 5.

The PRONOSTIA platform consists of three main parts: the rotating part, the degradation part, and the measurement part [40]. The data provided by PRONOSTIA describe the degradation of ball bearings throughout their service life (until complete failure), and each degraded bearing contains almost all types of defects (inner and outer rings, balls, and cage). In the experiment, a radial load force is applied to accelerate the degradation of the bearings. The vibration signals in the X and Y directions are acquired with sampling frequency of 25600 Hz, 2560 data points are recorded at 0.1 s per sampling, and the recording interval is 10 s. When the vibration level of the measured data exceeds a certain threshold, the test is stopped [41]. The basic characteristics of bearings are listed in Table 1, and Table 2 gives a detailed description of the dataset.

Different working conditions will have some influence on the training of the model. This paper selects the datasets under three conditions for experiments. The reason for the choice is that the three datasets had long decline periods and more information could be learned. In total, three groups of tests were carried out, and the detailed arrangement is shown in Table 3 [41].

4.2. Evaluation Metrics

In this paper, three metrics are used to evaluate performance: mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R2). The three evaluation metrics are defined as follows:where RULpredict and RULtrue represent the predicted RUL and the actual RUL, respectively, and n is the length of the testing data. The smaller the MSE and MAE, the better the prediction performance. The closer R2 is to 1, the better the prediction performance is.

4.3. Data Preprocessing

In this paper, CUSUM is used to detect the change point of the operating state of the rolling bearing vibration signal and to divide the bearing life cycle.

Taking Bearing 1_1 as an example, the steps of sample labeling are as follows:Step 1: Rolling bearing vibration signal actually measures the acceleration of the rolling bearing in a certain direction, it is usually a vibration signal in the horizontal direction, it represents not the size but the direction of bearing vibration, and the direction of vibration of the bearing is not of concern in the signal analysis; therefore, the signal should be first to take the absolute value. Due to the 2560 vibration signals sampled at a single moment of the bearing, this paper uses the mean value to replace the vibration signal at a certain moment. After processing the data, the signal curve of the absolute mean vibration signal is shown in Figure 6.Step 2: After the signal processing is completed, the signal curve is smoothed by SG for denoising, and then CUSUM is used to find the fault point. The former curve is regarded as the stable stage, and the RUL is 1. The latter curve is regarded as the declining stage, and the RUL is marked as decreasing from 1 to 0.After all the data RUL are labeled, because the different bearing vibration signal values have different intervals, the bearing vibration signal samples also need to be normalized.The degradation points of each bearing are given in Table 4, which demonstrates the validity of the method.Step 3: To verify the superiority of MSDRSN model in handling noisy data, according to the SNR for noise addition process, this paper adds 2 db, 4 dB, 6 db, and 8 db of noise in order. SNR formula is shown as follows:

4.4. Model Building Experiments

The structural parameters of MSDRSN are shown in Table 5. The hyperparameters need to be preset, such as the number of residual blocks and the size of the learning rate. The hyperparameters are shown in Table 6.

There are some other parameters in this experiment such as the validity of CUSUM and the validity of SG smoothing. Meanwhile, the number of residual blocks and the learning rate size are also determined by the experiment.(1)Effectiveness of CUSUM: a comparison test based on bearing 1_3, bearing 2_3, and bearing 3_3 at 8 db noise.The SVR machine learning model is used to test the model, and the kernel function is RBF. The results of the comparison experiments are shown in Figure 7.From the experiments, it can be shown that the MSE of bearing 1_3, bearing 2_3, and bearing 3_3 in which CUSUM is applied is lower than that of those in which CUSUM is not applied, and thus it can be concluded that CUSUM significantly improves the prediction capability of the model.(2)Validity of SG smoothing: a comparison test based on bearing 1_3 at 2 db, 4 db, 6 db, and 8 db noise.The number of network layers in this experiment is 2; the pooling layer size is 80, and the comparison item is whether SG smoothing is applied or not. The results of the comparison experiment are shown in Figure 8.From the experiments, it can be concluded that the training results are better when the model is SG smoothed. The training results are worse when SG smoothing is not set, so the training ability of the model is greatly enhanced by SG smoothing.(3)Number of residual blocks: a comparison test based on bearing 1_3 at 8 db noise.The most influential factor for the MSDRSN network is how many residual blocks are available. The validation part of the model structure mainly compares the training effects of MSDRSN with different numbers of basic residual blocks, and it determines the optimal number of network layers for this experiment. The experiment sets the convolutional kernel size of convolutional layer as 3, 5, and 7; the basic residual block convolutional kernel size as 3, 5, and 7; the output size of mean pooling layer as 18; and the aggregation layer as 2 layers with the number of neurons in each layer as 18 and 1. The number of basic residual blocks of the comparison items is 1, 2, 3, 4, and 5. The results of the comparison experiments are shown in Figure 9.(4)The size of the learning rate: a comparison test based on bearing 1_3 at 8 db noise.

The setting of the learning rate has a significant impact on the MSDRSN network. It is often that too low learning rate will lead to slow convergence of the model, while too high learning rate will lead to failure of the model to converge, so this experiment compares the optimal values of different learning rate losses and determines the optimal learning rate. The results of the comparison experiment are shown in Figure 10:

From the experiment, it can be concluded that when the learning rate is set as 0.000001, the loss value is the smallest and the model training result is the best, so the optimal learning rate of MSDRSN network is set as 0.000001.

5. RUL Prediction Results

The programming software used for the experiments is Python 3.6, and the central processing unit (CPU) used in the workstation is Intel i7-11800H.

During the experiment, the MSDRSN method was compared with the common methods, selecting two machine learning models, which are Random forest and SVR, and two deep learning models, which are BiLSTM [42] and MSCNN-FC [43]. BiLSTM is a stack of two LSTM layers. It can effectively utilize the input forward and backward feature information. MSCNN-FC can be used for prediction problems, which is similar to their algorithm for classification problems, but the last layer is usually a fully connected layer with only one neuron.

Table 7 shows the comparison table of evaluation metrics of MSDRSN method, BiLSTM, and MSCNN-FC in different noise environments.

Among them, MSDRSN has the smallest MSE and MAE, indicating that it has the smallest prediction error, and R2 is the largest, indicating that its model has the best fitting prediction results. Besides, the prediction accuracy of tasks 2_3 is low, and the prediction results of both condition 1 and condition 3 are better than those of other methods. The feasibility and superiority of MSDRSN are demonstrated by comparing with BiLSTM and MSCNN-FC. The results of the experiments conducted under different noises also demonstrate that the MSDRSN network is more suitable for prediction tasks with noisy data. Taking bearings 1_3 as an example, the RUL prediction curves under different noises are shown in Figure 11.

It can be shown that the prediction results of the method have a similar trend to the actual RUL values, and the prediction accuracy is significantly higher than that of BiLSTM and MSCNN-FC, and thereby it verifies the effectiveness of the model in rolling bearing RUL prediction.

In order to verify the superiority of the model, this paper chooses two traditional machine learning methods to compare with it, Random forest and SVR. Taking bearings 1_3 as an example, the RUL prediction curves of the three models under different noises are shown in Figure 12.

It is obvious from the prediction results that the model has a higher prediction accuracy compared with other traditional machine learning methods, and the curves are most compatible with the actual values, and there is no obvious fluctuation trend.

In summary, the method proposed in the paper can more effectively predict the remaining life of rolling bearings. The method can more accurately capture the early degradation characteristics of the bearing. At the same time, the method makes a significant improvement in the prediction of the later bearing life.

6. Discussion of RUL Prediction

(1)The MSDRSN method convolves operations with multiple convolution kernels of different scales to achieve the effect of extracting detailed features and increasing the prediction accuracy of the model. Stacking integrated learning can observe the features of the data from multiple perspectives and learn them with the base-learner and relearn them after fusing the obtained features so as to improve the accuracy of the prediction results. Then, this paper uses the powerful noise reduction capability of DRSN, which enables it to get more useful features of the dataset, which improves the feature learning capability of the model in complex datasets and improves the feature extraction capability of the model.(2)The model utilizes the “skip connection” structure of the residual network, which illustrates the good feature extraction performance of the deep network and avoids the problem of gradient disappearance when the deep neural network trains.(3)Compared with the four models including BiLSTM, MSCNN-FC, SVR, and Random forest, the MSDRSN method can get better results, and the R2under 4 db noise in bearings 1_3 is 0.975, and the loss values MAE and MSE under 6 db noise in bearings 3_3 are 0.002 and 0.043; the prediction is almost close to the actual life. Thus, it is verified that the MSDRSN method has a high accuracy of RUL prediction.

7. Conclusions

This paper proposes a novel rolling bearing RUL prediction method using MSDRSN. In the preprocessing stage of the dataset, this paper finds the fault points in the bearing with CUSUM. In the RUL prediction network, the idea of stacking integrated learning is merged with the traditional deep residual shrinkage network to improve the prediction ability of the model with noise. We have conducted experiments to validate the method with the IEEE PHM 2012 data challenge dataset, and the results show that the proposed MSDRSN method improves the accuracy of RUL prediction. Comparing with different methods for predicting the remaining useful life, it is demonstrated that the feasibility and superiority of the method.

The future directions of the article can be improved as follows:(1)MSDRSN has a powerful denoising capability, but the calculations take a very long time and the model parameters are too many, making it difficult to debug. In the future, people can consider using distributed technology to speed up the training of the model and create an algorithm that can automatically adjust the parameters of the model to find the optimal solution.(2)Stacking fusion technique can effectively improve the model training results, but the selection of base-learner is a difficult problem, which requires a lot of time to select. In the future, people can consider proposing an evaluation system to predict the effect of the base-learner model fusion and reduce the difficulty of model selection.

Data Availability

Previously reported data were used to support this study and are available at https://hal.science/hal-00719503. These prior studies (and datasets) are cited at relevant places within the text as references [35].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under grant no. 52175379 and Liaoning Provincial Science and Technology Department under grant no. 2022JH2/101300268.