Abstract
Timely detection and treatment of possible incipient faults in satellites will effectively reduce the damage and harm they could cause. Although much work has been done concerning fault detection problems, the related questions about satellite incipient faults are little addressed. In this paper, a new satellite incipient fault detection method was proposed by combining the ideas of deviation in unsupervised fault detection methods and classification in supervised fault detection methods. First, the proposed method uses dynamic linear discriminant analysis (LDA) to find an optimal projection vector that separates the in-orbit data from the normal historical data as much as possible. Second, under the assumption that the parameters obey a multidimensional Gaussian distribution, it applies the normal historical data and the optimal projection vector to build a normal model. Finally, it employs the noncentral F-distribution to test whether a fault has occurred. The proposed method was validated using a numerical simulation case and a real satellite fault case. The results show that the method proposed in this paper is more effective at detecting incipient faults than traditional methods.
1. Introduction
With the reduction in the costs of launching rockets and manufacturing satellites, the number of satellites operating in orbit increases annually, bringing large economic benefits to society [1–3]. However, due to the harsh operating environment of satellites and human error, key modules or components of satellites in orbit may have abnormalities or experience failures [4]. If incipient faults can be detected and dealt with promptly in their early stages, the damage and harm they cause will be effectively reduced [5]. Therefore, the detection of incipient faults in satellites is receiving an increasing amount of attention because it is one of the key technologies that ensures the normal operation of satellites [6, 7].
The current common method used to detect faults in satellites is to compare telemetry parameters with preset thresholds directly [8, 9]. This fault detection method is suitable for detecting abrupt and large faults. However, it may be less effective at detecting incipient faults because the telemetry parameters with an incipient fault may not change significantly from their normal condition [10]. If the fault detection threshold was set too low, the fault detection method would be sensitive to noise and cause frequent false alarms; whereas if the threshold was set too high, some early symptoms of the fault might be missed. In addition, as the production batches, processes, and operating environments of different satellites are not identical, different fault detection thresholds may need to be determined for different satellites, and it is inefficient to manually set the appropriate threshold for each telemetry parameter.
Model-based satellite fault detection methods are more intelligent than threshold comparison methods and often combine fault detection, isolation, and recovery functions [11–13]. However, with the rapid development of science and technology, a variety of new technologies, materials, and highly integrated devices are being used in satellites. The complex coupling relationships between the various components of a satellite and the lack of familiarity of various faults can make it difficult to build accurate and comprehensive fault detection models, thus limiting the application of model-based satellite fault detection methods.
In recent years, data-driven fault detection methods have become a popular research topic due to the advantages of low expert involvement, high modeling efficiency, and high scalability [14–16]. At present, data-driven fault detection methods are mainly divided into two categories: unsupervised fault detection methods and supervised fault detection methods. The core idea of unsupervised fault detection methods is deviation. These methods use the normal historical data to automatically build a model that characterizes the normal condition of the satellite. It assumes that a fault has occurred when the actual in-orbit data deviates significantly from the model characterizing the normal data. Since the ground test data and the in-orbit data of satellites contain mostly fault-free data, fault detection methods based on unsupervised learning have been widely researched and applied. Representative unsupervised fault detection methods are one-class support vector machine (OCSVM) [17], inductive monitoring system (IMS) [18], principal component analysis (PCA) [19], Gaussian process regression (GPR) [20], long short-term memory (LSTM) [21], and so on. Although these methods use different principles to build normal models, they all have one thing in common—all normal models used to detect faults are obtained by learning from the normal historical data. Once the learning process is complete, each method will use a fixed and invariable model to detect faults, regardless of the variation of the actual in-orbit data, with no optimization or adjustment for the actual faults that may occur.
The core idea of supervised fault detection methods is classification. These methods learn and build a classifier from the normal historical data and various real or simulated fault data. If the in-orbit data were classified as a normal class by the classifier, the in-orbit data would be free of faults. Conversely, if the in-orbit data were classified as a fault class by the classifier, the in-orbit data would be deemed to be faulty in some way. Representative supervised fault detection methods are linear discriminant analysis (LDA) [22], support vector machine (SVM) [23], neural networks [24], random forest [25], and so on. However, due to the high reliability of satellites, most of the samples collected by satellite operation and maintenance systems are normal, and fault samples are exceedingly rare. In addition, the classification models built using the fault samples from different satellites may not be generalized, thus hindering the application of supervised fault detection methods in the satellite domain.
Based on the existing research, this paper proposes a new satellite incipient fault detection method that combines the ideas of deviation in unsupervised fault detection methods and classification in supervised fault detection methods. The main contributions of our work are summarized as follows:(1)This paper first uses the idea of classification to find an optimal projection vector separating the in-orbit data from the normal historical data. Specifically, this paper considers the fault detection problem as a binary classification problem and uses LDA to find the optimal projection vector where the in-orbit telemetry data can be distinguished from the normal historical data to the greatest extent.(2)This paper then uses the idea of deviation to test whether a fault has occurred in the in-orbit data. Specifically, a normal model is built using the normal historical data and the optimal projection vector, and the fault is determined by testing whether the deviation of the in-orbit data from the normal model exceeds the threshold.
This paper is organized as follows: A brief introduction about LDA is given in Section 2. The fault detection method based on dynamic LDA is presented in detail in Section 3. Then, the proposed method is illustrated and analysed using a numerical simulation case and a real satellite fault case in Section 4. Finally, conclusions are given in Section 5.
2. Linear Discriminant Analysis
Linear discriminant analysis, also known as Fisher discriminant analysis [26], is a supervised dimensionality reduction and classification method which is widely used in the field of pattern recognition and machine learning [27–29]. Taking the binary classification as an example, given a data set , where is a column vector of multidimensional telemetry parameters, is the corresponding class label, is the number of variables need be monitored, and is the is the number of samples. There are only two values of , . Let , , , and , respectively, represent the number of samples, the set of samples, the mean vector, and the covariance matrix of the class , .
We assume that the projection vector is . For each sample , the projection of onto the vector is . Moreover, the projections of and onto the vector are and , respectively. The scatter of each class after projection onto the vector is , as shown in
We expect that the samples of the same class are clustered together as much as possible after projection onto the vector , while the samples of different classes to be more dispersed [30]. Thus, we can construct the objective function of LDA , as shown in
In equation (2), and . The objective of LDA is to find an optimal projection vector that maximizes . Let , the problem of finding the optimal projection vector can be transformed into an optimization problem, as shown in
The optimization problem in equation (3) can be solved by Lagrange multiplier method and then we obtain
From equation (4) and the relationship between eigenvalues and eigenvectors [30], we can know that the projection vector is an eigenvector of the matrix . Furthermore, the optimal projection vector is the eigenvector corresponding to the largest eigenvalue of the matrix .
3. Incipient Fault Detection Method Based on Dynamic LDA
3.1. Dynamic LDA
The training data of traditional LDA contain both normal (class 0) and fault (class 1) samples. However, in the field of satellite fault detection, majority of samples available for training are normal samples. Therefore, this paper proposes a new method that treats the normal historical samples as normal (class 0) samples and treats the in-orbit samples which need to be tested as fault (class 1) samples. The traditional use of LDA and the new use of LDA in this paper are shown in Figures 1(a) and 1(b), respectively.

(a)

(b)
This article intends to use sliding windows and hypothesis testing methods to test whether a fault has occurred in the in-orbit data. The general idea of fault detection is as follows:(1)Sliding windows with the length of are used to extract the in-orbit data in real time. Let the in-orbit data in the kth sliding window be . We assume that a fault has occurred in , and belongs to a different class from the normal historical data .(2)LDA is used to find an optimal projection vector that separates the normal historical data from the in-orbit fault data as much as possible.(3)A normal model is built using the normal historical data and the optimal projection vector .(4)Whether the in-orbit data deviates significantly from the normal model is tested. If there was a significant deviation, then the original hypothesis that and belong to different classes is valid, and a fault has occurred in . If there was no significant deviation, then the original hypothesis is not valid, and there is no fault in .
As can be seen earlier, the traditional use of LDA is static. The optimal projection vector will be fixed once the training data are determined. The new use of LDA in this paper is dynamic. For each sliding window of the in-orbit data , an optimal projection vector is obtained using dynamic LDA. As the in-orbit data may vary from different windows, the optimal projection vector may not be the same for each LDA process. Due to the use of dynamic LDA, the optimal projection vectors can adjust the in-orbit data in real time, making the proposed method more adaptable to potential faults.
3.2. Construction of the Normal Model
After the optimal projection vector is obtained, we need to verify whether there is a significant deviation between and . However, how large of the deviation is the significant deviation? Therefore, we need to determine the normal fluctuation range of deviation between and when the in-orbit data is normal, and then use the normal fluctuation range to build a normal model. A fault is considered to have occurred when the deviation is outside the acceptable range.
In this paper, the objective function of LDA is used as the measure of deviation. We assume that the normal historical data and the in-orbit data obey two m-dimensional joint Gaussian distributions and , respectively. The projections of and onto the vector are and , respectively. Based on the property of the m-dimensional joint Gaussian distribution [31], it is clear that and obey one-dimensional Gaussian distributions and , respectively. The relationship of , , and is shown in
Since and have been obtained after using LDA, it can be considered that the mean vector , the covariance matrix , and the optimal projection vector in equation (5) are known and fixed, while the mean vector and covariance matrix associated with are unknown and variable. As , , and are all known, we can assume that and , where and are two constants. Then, equation (5) can be reduced to
For the purpose of obtaining the normal fluctuation range of , we assume that and belong to the same class and is obtained by sampling the joint Gaussian distribution which obeys. Since and obey one-dimensional Gaussian distributions and are the projections of and onto the vector , respectively, we can consider that is obtained by sampling the one-dimensional Gaussian distribution which obeys. Based on the property of the one-dimensional Gaussian distribution [32, 33], the sample mean value of obeys a one-dimensional Gaussian distribution, as shown in equation (7). obeys the chi-square distribution with degrees of freedom of [32, 33], as shown in the following equations:
Since obeys a one-dimensional Gaussian distribution and is a constant, also obeys a one-dimensional normal distribution, as shown in
As and , we can obtain
After normalizing , we can obtain
Furthermore, we can get equation (12) from the relationship between the standard normal distribution and the chi-square distribution:
Thus, the numerator of satisfies
Using to simplify equation (8), we can get
Therefore, the denominator of satisfies
In summary, the numerator of multiplied by a constant obeys a chi-square distribution with a 1 degree of freedom. The denominator of minus a constant and then multiplied by a constant obeys a chi-square distribution with degrees of freedom. Therefore, the denominator of obeys a noncentral chi-square distribution. In addition, and are independent of each other. The relationship between the chi-square distribution and the F-distribution and equations (13), (15) show that obeys a noncentral F-distribution with degrees of freedom of and 1 and a noncentral parameter , as shown in
Therefore, we can use the noncentral F-distribution to test whether there is a fault in [33]. Given a significance level , the detection threshold of can be obtained from the noncentral F-distribution test. If the value of was greater than or equal to , we consider that and belong to the same class, and there is no fault in . If the value of was less than , we consider that and belong to different classes and a fault has occurred in . Taking the reciprocal of , we can obtain
3.3. Overall Fault Detection Process
The pseudocode for the overall fault detection method based on dynamic LDA is as follows:(1)Each parameter of the normal historical samples is normalized by Z-score to obtain (2)A sliding window with the length of is used to extract the in-orbit data and is obtained(3)The in-orbit data is normalized by Z-score to obtain (4)LDA is used to find the optimal projection vector between and (5)A normal model is built using and the optimal projection vector , and the detection threshold is obtained with the significance level (6)The value of LDA objective function is calculated according to equation (5)(7)Determine whether is greater than ?. If was, the in-orbit data is faulty; otherwise, is normal. Let , the in-orbit data of next sliding window will be tested from Steps 3 to 7.
The computation cost of finding the optimum projection vector for each window mainly consists of matrix inversion, matrix multiplication, and solving eigenvalue problem. The time complexities of these three parts are , where is the number of monitored variables. Considering all the aforementioned computation cost parts, the computation cost of finding the optimum projection vector for each window is .
4. Case Studies and Analysis
4.1. Numerical Case
4.1.1. Experiment with the Fixed Fault Magnitude
A numerical simulation experiment which includes three faults was conducted to verify the effectiveness of the method proposed in this paper. The system was modeled as shown in
In equation (18), and are independent Gaussian-distributed source signals and noises, respectively. All the source signals obey the standard normal distribution . , , and are three incipient faults and do not occur simultaneously. are the eight telemetry parameters that need to be monitored. All the fault types of , , and are offset faults, as these faults occur more frequently in satellites. The three faults were inserted as shown in
In this paper, four evaluation indexes: fault detection rate (FDR), false alarm rate (FAR), F1 value, and AUC value were chosen as the indexes for evaluating the fault detection results:
The other parameters of the numerical case were set as follows. The total number of samples was 120,400 of which 60,200 were normal historical samples and 60,200 were in-orbit samples for testing. The sliding window length was 300, and the sliding window interval was 100 for both the normal historical data and the in-orbit data in the experiment. After using sliding windows, both 600 windows were obtained from the normal historical data and the in-orbit data. The first 300 windows of the 600 windows of the in-orbit data were normal windows, while the last 300 windows were fault windows. The signal-to-noise ratio (SNR) was set to 30 dB [34].
In this paper, eight common fault detection methods were chosen as comparison methods, namely isolation forest (IForest) [35], OCSVM [36], nearest neighbor (KNN) [37], local outlier factor (LOF) [38], histogram-based outlier score (HBOS) [39], PCA with statistic () [19], PCA with squared prediction error statistic (PCA + SPE) [19], and PCA with combined index (PCA + CI) [40]. For comparison purposes, the parameters monitored by these eight methods were the mean values of each sliding window samples instead of the original values. IForest, OCSVM, KNN, LOF, and HBOS were implemented using the open-source program PyOD [41]. The parameters of PyOD are shown in Table 1. The significance of the parameters were detailed in Appendix (if there were no special explanations, other parameters were default values). For the three PCA-based fault detection methods, the cumulative variance contribution rate was 90%, and the confidence level of the statistic was set to 95%. The significance level of the proposed method was set to 0.005.
The detection results of nine fault detection methods for the fault are shown in Figure 2. As can be seen from Figure 2, seven methods such as IForest, KNN, LOF, HBOS, , PCA + CI and the proposed method have satisfactory detection results for the fault , while the other two methods (OCSVM and PCA + SPE) have slightly poor detection results for the fault .

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)
The detection results of nine fault detection methods for the fault are shown in Figure 3. As is shown in Figure 3, the results of the first seven fault detection methods are not satisfactory for the fault . The fault or anomaly scores of these methods except OCSVM did not change significantly before and after insertion of the fault . Although the detection result of OCSVM is better, there are still a large number of fault windows below the threshold. It can be seen from Figure 3(i) that the proposed method has a good detection result for the fault .

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)
As can be seen from Figure 4, except OCSVM and the proposed method, the other seven fault detection methods have poor performance in the detection of the fault . However, the detection result of OCSVM is not stable. In other words, due to randomly generated signal sources and noise sources, OCSVM may obtain good or poor results.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)
The three faults were simulated randomly 100 times in this paper [42], and then the average values of the fault detection results were calculated and are shown in Table 2.
As can be seen from Table 2, all the false alarm rate of the nine fault detection methods were concentrated in the vicinity of . Consequently, it could be considered that the results in Table 2 are comparative results under similar false alarm rate condition. In terms of fault detection rate, the fault detection rate of the proposed method for the faults , , and ranked 5th, 1st, and 1st, respectively. As for the fault , the proposed method ranked 5th but only 3.19% lower than the 1st method. In terms of F1 value, the F1 values of the proposed method for the faults , , and ranked 5th, 1st, and 1st, respectively. In terms of AUC value, the AUC values of the proposed method for the faults , , and ranked 3rd, 1st, and 1st, respectively.
Figure 5 shows the AUC value of 100 simulation results using OCSVM and proposed method to detect the fault . As can be seen from Figure 5, the detection results of OCSVM are unstable. Therefore, the evaluation index of OCSVM is not very high after averaging. On the contrary, the detection results of the proposed method are stable and satisfactory.

4.1.2. Experiment with Different Fault Magnitudes
It is evident from Figure 4(i) that the proposed method has large allowances for the faults and . In other words, it seems that the proposed method can also detect the smaller magnitude of the faults . To test the ability of detecting smaller faults of the proposed method, another experiment was conducted and the fault was taken as an example. All the simulation environment and experimental parameters were retained, but the fault magnitude of in equation (19) was varied. The fault magnitude of was increased from 0.001 to 0.08, and the increase interval was 0.001. Each fault magnitude was simulated 30 times and the average value was used as the result. The F1 and AUC values of the detection results of nine methods with different fault magnitudes are shown in Figure 6.

(a)

(b)
As shown in Figures 6(a) and 6(b), the F1 and AUC values of the aforementioned nine methods show an increasing trend as the fault magnitude of increases gradually, but the rate of increase varies among the nine methods. The F1 and AUC values of the proposed method increased rapidly with the increase of the fault magnitude and remained finally near the highest value. The optimal fault detection method of the other eight methods for the fault was the OCSVM method, but it was slower than the method proposed in this paper. Due to the influence of noise, the detection result of OCSVM fluctuated greatly. As can be seen from Figure 6, the detection result of the proposed method may not be advantageous for large magnitude faults. However, in the case of incipient faults, the proposed method has an obvious advantage over the other eight fault detection methods. Therefore, the method proposed in this paper is more comprehensive in its ability to detect the different magnitudes of the fault .
4.1.3. Analysis and Discussion
Why does the proposed method differ significantly in the detection of the faults , , and ?. Why is the proposed method more sensitive to small-magnitude faults than other methods? This paper attempted to explain the reasons from the perspective of the optimal projection vector. For presentation purposes, the optimal projection vectors for each sliding window were normalized (the moduli of vector were set to 1) and taken the absolute value. The optimal projection vectors obtained by using dynamic LDA before and after the faults , , and are shown in Figures 7(a)–7(c), respectively. In Figure 7, the first 300 windows were the normal windows, while the last 300 windows were the faulty windows.

(a)

(b)

(c)
As can be seen from Figure 7, due to the influence of noise, the optimal projection vectors obtained by the proposed method were chaotic and had no fixed pattern in the absence of faults. However, in Figures 7(b) and 7(c), the optimal projection vectors obtained using dynamic LDA showed regular patterns after faults and occurred. In comparison with Figure 7(a), the optimal projection vectors obtained using dynamic LDA were still chaotic after the occurrence of the fault , and there is no significant advantage between the proposed method and traditional methods:
The size of each component of the optimal projection vector determines the degree of scaling of different parameters in . Taking Figure 7(b) as an example, the weight of the parameter was up to about 0.6 after the fault occurred, while the weights of the parameters and were about 0.45, and the weights of the remaining parameters were below 0.3. It can be seen from equation (20) that the fault was added to the parameter . It can be concluded that the optimal projection vector enlarged the weights of fault parameters and suppressed the weights of the other parameters. The enlargement of the fault parameters improves the ability of the proposed method to detect small-magnitude faults, while the suppression of the other parameters reduces the effect of noise from the other parameters.
As can be seen from Figure 7(c), the weights of and were significantly higher than the weights of the other parameters after the fault occurred. It can be seen from equation (20) that the fault was added to the parameter . It can be concluded that the optimal projection vectors also enlarged the weights of fault parameters and suppressed the weights of other parameters after the fault occurred.
The traditional fault detection methods, such as OCSVM, IForest, and PCA, are static methods. Once the learning process is complete, they will use a fixed and invariable model to detect faults. Although the KNN method is dynamic, it treats each parameter equally. They did not have the aforementioned dynamic process of enlarging the weights of fault parameters and suppressing the weights of irrelevant parameters. Consequently, the other eight methods were not effective for the faults and . Although OCSVM can obtain significant results sometimes, the detection results are not stable. After analysis, we consider that the instability of OCSVM may be caused by the selection of noise as the support vectors. In addition, comparing Figures 7(b) and 7(c), the optimal projection vectors were not the same for different faults. The optimal projection vectors obtained using the method proposed in this paper can be automatically adjusted according to the actual fault, and there is no need to manually set the parameter weights in advance.
4.2. Real Satellite Fault Case
On 28 June 2020, a fault occurred in a key component of a satellite payload. The postfailure analysis showed that the fault had been generated and developed over a long period of time. However, due to the first occurrence of the fault and the small magnitude of the initial failure, the fault was not detected and dealt with promptly. Finally, the status of the key component became unavailable. In this paper, a total of 1,315,515 samples of five telemetry parameters related to the faulty component were collected from satellite measurements and control systems from 12:15:04 on 10 November 2018 to 19:59:39 on 2 December 2019, as shown in Figure 8. For reasons of confidentiality, the true telemetry parameter names were hidden.

The first 266,540 samples of the total samples were selected as normal historical data, while the next 197,000 samples were selected as test data. The sliding window length was still set to 300 for both the test data and the normal historical data, but the sliding window interval was set to 100. After the sliding window extraction, a total of 2663 windows were obtained from the normal historical data, and a total of 1968 windows were obtained from the test data. The first 324 windows of the 1968 test windows were normal windows, while the subsequent windows were fault windows. The significance level of the proposed method was set to 0.0001, while the rest of the experimental parameters were the same as those presented in Section 4.1.1. The evaluation indexes and detection results of the nine fault detection methods for the real satellite fault are shown in Table 3 and Figure 9, respectively.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)
It can be summarized from Figure 9 that the four methods, which including OCSVM, KNN, LOF, and the proposed method, have better detection results for the real satellite fault. In terms of four evaluation indexes, the proposed method all obtained satisfactory results. Although the proposed method has little advantage over the other fault detection methods, it still ranked as the best that can be seen from Table 3. The effectiveness of the proposed method is further verified by the real satellite case.
5. Conclusions
Based on the analysis and comparison of existing satellite fault detection methods, this paper proposes a new incipient fault detection method that combines the core ideas of unsupervised learning and supervised learning. Then, the effectiveness and superiority of the proposed method were verified through a numerical simulation case and a real fault case. This paper only studies linear Gaussian system. If the system did not meet the assumptions of joint Gaussian distribution and linearity, the detection effect of the proposed method might decrease. Due to the insensitivity of LDA to variance, the proposed method is suitable for detecting the slight abnormal change of mean instead of the slight abnormal change of variance.
Appendix
The website of PyOD is https://github.com/yzhao062/Pyod. IForest n_estimators: the number of base estimators in the ensemble. contamination: the amount of contamination of the data set, that is, the proportion of outliers in the data set. OCSVM kernel: it specifies the kernel type to be used in the algorithm. nu: an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. contamination: the amount of contamination of the data set, i.e., the proportion of outliers in the data set. KNN n_neighbors: the number of neighbors to use by default for k neighbors queries. contamination: the amount of contamination of the data set, i.e., the proportion of outliers in the data set. LOF n_neighbors: the number of neighbors to use by default for k neighbors queries. contamination: the amount of contamination of the data set, i.e., the proportion of outliers in the data set. HBOS n_bins: the number of bins. alpha: the regularizer for preventing overflow. tol: the parameter to decide the flexibility while dealing the samples falling outside the bins. contamination: the amount of contamination of the data set, i.e., the proportion of outliers in the data set.
Data Availability
The numerical case data used to support this study are included within the article. The real satellite fault data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
Acknowledgments
This research was supported by the Beidou Navigation In-Orbit Support System (grant no. JKBDZGDH01) and National Special Support Plan for High-Level Talents (grant no. WRJH19DH01).