Abstract

In this paper, we address the problem of vision-based satellite recognition and pose estimation, which is to recognize the satellite from multiviews and estimate the relative poses using imaging sensors. We propose a vision-based method to solve these two problems using Gaussian process regression (GPR). Assuming that the regression function mapping from the image (or feature) of the target satellite to its category or pose follows a Gaussian process (GP) properly parameterized by a mean function and a covariance function, the predictive equations can be easily obtained by a maximum-likelihood approach when training data are given. These explicit formulations can not only offer the category or estimated pose by the mean value of the predicted output but also give its uncertainty by the variance which makes the predicted result convincing and applicable in practice. Besides, we also introduce a manifold constraint to the output of the GPR model to improve its performance for satellite pose estimation. Extensive experiments are performed on two simulated image datasets containing satellite images of 1D and 2D pose variations, as well as different noises and lighting conditions. Experimental results validate the effectiveness and robustness of our approach.

1. Introduction

Optical imaging sensors have been widely used as the essential payloads of vision systems in aerospace applications: autonomous rendezvous and docking [14], vision-based landing [5], position and pose estimation [613], on-orbit serving [14, 15], space robotics [16], satellite recognition [12, 1719], 3D structure reconstruction and component detection [20, 21], etc. Vision-based recognition and pose estimation of a target satellite are one of the key technologies to achieve these applications. The manufacturing technology and performance of the imaging sensors develop rapidly in the past decades. For example, the space-based visible (SBV) sensor [22] can provide images with spatial resolution by possibly reducing the range from the imaging sensor to the target satellite [23]. Recently improved sCMOS sensors [24] can perform affordable, wide-field, and rapid cadence surveillance from the low Earth orbit (LEO) to out past the geosynchronous orbit (GEO). In addition, a next-generation chip scale imaging sensor SPIDER [25] has been designed to provide higher resolution by enabling a larger-aperture imager in a constrained volume. Owing to the improvement of imaging sensors, image data captured by space-based vision systems can be of higher quality. Such high-quality image data contain more detailed information of the target satellite, which thus could benefit satellite recognition and pose estimation.

Previous vision-based pose estimation methods for space objects can be broadly divided into two classes, i.e., 3D model-based methods and 2D image-based methods. 3D model-based pose estimation requires a prior 3D model of the target space object which contains rich information: structure, shape, textures, and so on. The 3D model could be a CAD model [1, 3, 4, 26, 27] or 3D point cloud [2, 13]. When estimating the poses of the target, 3D models are utilized directly to 2D images or to the 3D data reconstructed by 2D images using stereo vision [28]. In addition, the 3D model can also be used to generate projection images from sampled viewpoints for matching when an input 2D image is given [27]. However, such accurate prior 3D models are usually difficult to be obtained in practice. 2D image-based methods attempt to directly restore pose information from an image sequence or a single image based on binocular (or stereo) vision [8, 9] or monocular vision [6, 14, 29]. Although no prior 3D model is needed, most of existing image-based methods may need camera calibration [8, 29] or optical markers [14] on the target spacecraft. Meanwhile, due to the principle of binocular vision, binocular vision-based approaches may be invalid when the imaging sensors are far from the target spacecraft; i.e., the distance from the target spacecraft to the imaging sensors is much farther than the distance between the two imaging sensors. Recently, learning-based methods [1012, 19] have been proposed without the above limitations. Zhang et al. [10] introduced Homeomorphic Manifold Analysis to the aerospace area to estimate relative poses of space objects. Zhang and Jiang [11] and Zhang and Jiang [19] handled spacecraft pose estimation by using kernel regression-based methods. These methods are fundamentally based on supervised learning technology (like kernel regression), and no other requirements but only training image data are needed, which thus can be also regarded as 2D image-based methods.

In the past decade, vision-based satellite recognition or identification is attracting more and more interest [12, 1719, 3033]. Some of these methods focus on extracting good features to improve recognition performance of traditional classifiers like k-nearest-neighbors, support vector machines, etc. [17] represented the combined features of satellite images in a latent space generated by kernel locality preserving projections. Pan et al. [30] achieved satellite recognition by fusing infrared and visible image features. Ding et al. [18] proposed normalized affine moment invariants and illumination invariant multiscale autoconvolution for autonomous space object identification. Shi et al. [31] encoded satellite image features by elastic net sparse coding and [32] also used sparse coding-based probabilistic latent semantic analysis to get semantic features for satellite recognition. Meanwhile, the other approaches tend to use various machine learning models for a better recognition of the satellites. Zhang and Jiang [19] solved multiview space object recognition by kernel regression, [12] used homeomorphic manifold analysis for satellite recognition, and [33] built a nine-layer deep convolutional neural network DCNN to achieve space target recognition. In particular, [19] and [12] formulated vision-based satellite recognition and pose estimation in one framework.

In this paper, following previous works [12, 19], we also address the problem of vision-based satellite recognition and pose estimation, which is to estimate the relative pose of a target satellite and simultaneously recognize its category using imaging sensors. We develop a novel method based on monocular vision using Gaussian process regression (GPR), which is not only powerful for predicting continuous quantities but also applicable for discrete values. We make assumption that the regression function mapping from the image (or feature) of the target satellite to its relative pose or category follows a Gaussian process (GP), and then, this GP can be properly parameterized by a mean function and a covariance function. Given the training data, we can easily obtain the explicit formulations of predictive equations by a maximum-likelihood approach, in which the mean value of the predicted output (i.e., the estimated pose or recognized category) and its variance (which indicate the uncertainty) can be computed. Considering the fact that we recognize the category of a target satellite priorly before estimating its pose using GPR, we can solve multiview satellite recognition and pose estimation in one framework via pose-after-category strategy. Besides, as shown in [10, 12, 19], images of one space object with different poses lie on intrinsic low-dimensional manifolds which are homeomorphic to each other, and such homeomorphic manifold can be beneficial for pose estimation. Therefore, we also use normalized -sphere (i.e., the homeomorphic manifold) in a -dimensional Euclidean space to represent -degree-of-freedom (-DoF) pose variation and then learn a multiple-output GPR model for satellite pose estimation. To validate the effectiveness and robustness of our approach, extensive experiments have been performed on a simulated image dataset called BUAA-SID dataset [19], considering multiple complex conditions including 1D and 2D pose variations, image noises, and lighting conditions.

Comparing with previous vision-based satellite recognition and pose estimation methods, the contributions of this paper can be divided into three aspects. Firstly, GPR, a more powerful regression model, is introduced for the task of satellite recognition and pose estimation. GPR has strong statistical foundation and many desirable properties which are suitable for classification and pose estimation. Secondly, a homeomorphic manifold constraint is employed to improve the pose estimation capability of the original GPR model. Our manifold constrained Gaussian process regression (MCGPR) can obviously improve the performance of pose estimation, leading a surprising increase of more than one order of magnitude in some cases. Thirdly, as the GPR model can offer a variance of each output, we defined and calculated the uncertainty of the predicted value by its variance. The uncertainty represents the credibility of the predicted value, which can help us choose more convincing results. The uncertainty represents the credibility of the predicted value, which can help us choose more convincing results. In the field of aerospace, identifying the certainty may be more important than the accuracy of the results, so the uncertainty has an important strategic significance. It should be noticed that this paper is an extended version of our paper [34] published in IEEE Aerospace Conference 2015 at Big Sky, MT, USA, where parts of a pose estimation work were previously represented.

The rest of the paper is organized as follows: we detailedly describe the framework of the GPR model for satellite recognition and pose estimation in Section 2. Experimental results and analyses are presented in Section 3. Section 4 concludes the paper.

2. Methodology

2.1. General Gaussian Process Regression Model

Gaussian processes (GPs) have received increasing attention in the field of machine learning in the past few years, for classification or regression [3540]. Classification and regression are subproblems of supervised learning, which involve the prediction of discrete and continuous quantities, respectively. Since discrete quantities can be seen as samples of continuous ones, we can also perform classification the same as regression. Thus, we can use a regression model for both classification and regression problems. Gaussian process regression (GPR) has many desirable properties, such as ease of obtaining and expressing uncertainty in predictions, the ability to capture a wide variety of behaviour through a simple parametrisation, and a natural Bayesian interpretation [36]. So we try to employ it to solve the problem of classification and pose estimation of space objects. We will describe the framework of GPR briefly in this section.

is set as the input training data, where are the input data (i.e., different kinds of image representations, e.g., the six kinds of image representations used in Section 3 of this paper), is the -dimensional Euclidean space, are the corresponding target values represented in the -dimensional Euclidean space, and is the number of training data. For clarity and without loss of generality, we take 1D output for example, i.e., indicates a certain dimension of . Then, the regression model for the task with noise can be written as , where is the regression function mapping from to and noise follows an independent, identically distributed Gaussian distribution with zero mean and variance , i.e., . According to [38], the Gaussian process can be written as where the mean function is taken to be zero and the covariance function is set to be the squared exponential covariance function with isotropic distance measure with unit magnitude [38], where and are two datapoints, is the -order identity matrix, and parameter is the characteristic length-scale. Then, in the GPR model, the Gaussian process prior over the function is [38] where denotes the regression function outputs of the entire training set and denotes the matrix of the covariances evaluated at all pairs of training points using equation (2).

Let the testing set be ; then, under prior equation (3), the joint distribution of the observed target values of and the regression function outputs of can be written as [38] and the likelihood is a factorized Gaussian . Thus, the predictive equations for the GPR model can be written as [38] where

The target value can be represented by the mean values , and the variances in can indicate the uncertainties of the target values predicted by the model. The larger the variance is, the less convincing the corresponding estimated pose is. We may reject some estimation results in practice if their variances are too large.

To predict multiple output variables, we can follow the simple approach mentioned in Section 9.1 of [38] in an intuitive way, i.e., modeling each output variable as independent of other variables and treating them separately.

2.2. GPR for Satellite Recognition

When GPR is used for satellite recognition, the target value should be class labels. For a -class classification problem, we can define the target values as a label vector , where the value ‘1’ is located at the -th dimension for class (). Then, we can train -independent GPR models to predict the label vector of a testing data , and ’s predicted class label if is maximal among all the dimensions. At the same time, we can obtain the corresponding variance to indicate the uncertainty of the predicted class label. It should be noticed that we indeed treat -class classification problem as -independent 2-class classification problems using a multiple-output GPR model.

2.3. Manifold Constrained GPR for Pose Estimation

Homeomorphic view manifold has been applied for pose estimation recently. Zhang et al. [10] used normalized -sphere in as a homeomorphic manifold representation of all view manifolds in the case of 1D pose variation, while [19] expending such homeomorphic manifold constraint for a more general case. A similar thing happened in the case of 2D and 3D pose variations. For the case of 2D pose variation, a normalized -sphere in can represent a 2D pose variation of one object, and for the 3D case, the homeomorphic view manifold becomes a normalized -sphere in . Assuming the pose angles of the input are , where , , and are the yaw angle, pitch angle, and roll angle, respectively, then, the pose on the homeomorphic view manifold can be written as [10] Instead of simply using the pose angles, the representation of the poses on the homeomorphic view manifold can preserve the topology of the view manifolds. Such representation has achieved significant improvement for pose estimation [10, 12, 19].

Applying the homeomorphic manifold constraint to the GPR framework, we expand the output of the GPR framework and we can obtain a multiple output even in the case of 1D pose estimation. So we call such multiple-output GPR model manifold constrained Gaussian process regression (MCGPR). For example, in a 1D case, the original GPR model can be learned by the training data , where is the corresponding yaw angle of data , and such a single-output GPR model can predict the yaw angle of an input directly. But when predicting the 1D pose of the input using the MCGPR model, the MCGPR model should be learned by the training data , where the pose of is presented as using equation (7) in the 1D case. Then, we can get the predicted output in as , where and are the first and the second dimensions separately obtained by their respective GPR models. Then, the estimated yaw angle can be computed by the inverse trigonometric function as

When predicting the 2D pose of the input using the MCGPR model, the training data become , where is the corresponding pitch angle of data . Then, the predicted output in is , getting a third dimension more than the 1D case. According to equation (7), the estimated yaw angle can still be computed using equation (8), and the estimated pitch angle can be computed by the inverse sine function as

Similar solutions can be solved for the 3D case. It should be noticed that, unlike original GPR-based pose estimation that directly predicts the pose angles, such solutions are specially designed for our MCGPR model because the outputs of MCGPR are constrained on the homeomorphic manifold.

2.4. Uncertainty of Predicted Values

Let represent the uncertainty of the target value , then if we use the GPR model to predict , we can get the variance of , i.e., directly, and we define , where is the standard deviation. By such definition, the uncertainty will share the same physical meaning and unit as the target value . To make it clear, we describe our process for calculating the uncertainty of the predicted class label and estimated pose angles in detail.

2.4.1. Uncertainty of Predicted Class Label

As mentioned in Section 2.2, the uncertainty of the predicted class label can be defined as where is the -th dimension output of the predicted label vector .

2.4.2. Uncertainty of Predicted 1D Pose

In the 1D case, if we use the original GPR, the target value will be the yaw angle , i.e., , just as we defined it. However, in the MCGPR model where and , the relationship between the estimated yaw angle and the target values and follows the inverse trigonometric function as in equation (8). As and can be obtained directly by the model, to compute , we consider the total differential of as where stands for the partial differential. From one perspective, a differential can represent the variation of , while from another perspective, can also indicate the uncertainty of . So we can regard as . Then, noticing that , the uncertainty of can be computed according to the total differential in equation (11) as

2.4.3. Uncertainty of Predicted 2D Pose

The output of the original GPR for 2D pose estimation will be , where and are the yaw angle and pitch angle, respectively, with the uncertainties and . If we use the MCGPR model with output , where , , and according to equation (7), the relationship between the estimated pitch angle and the target value follows the inverse trigonometric function as in equation (9), and the relationship between the estimated yaw angle and the target values and follows the inverse trigonometric function as in equation (8). Similar with the 1D case, we can compute the uncertainty of according to its differential as Since the relationship between , , and changes to in the 2D case, according to equations (11) and (12), the uncertainty of the yaw angle becomes

By calculating the uncertainty, we can learn the relationship between the absolute error of the prediction result and the uncertainty and determine the corresponding threshold based on the existing results. Then, the result with uncertainty greater than the threshold can be regarded as an unreliable result, and a second interpretation could be performed to ensure the credibility of the results.

3. Experiments and Analyses

3.1. Dataset and Image Representation

We performed experiments on two simulated satellite image datasets BUAA-SID 1.0 [17, 41] and BUAA-SID 1.5 [10, 12, 19] to evaluate the proposed method. BUAA-SID 1.0 is a publicly available satellite image dataset containing 4600 gray images of 20 satellites together with 4600 corresponding binary images. These noise-free images were sampled on a viewing sphere from 230 viewpoints. The BUAA-SID 1.0 dataset can be used for testing the performance of GPR for multiview satellite recognition. Images in BUAA-SID 1.5 were simulated via the simulation method introduced in [42] using ten 3D satellite models selected from BUAA-SID 1.0. It contains four subsets. In the 1D subset of BUAA-SID 1.5, 3600 gray images and their corresponding binary images were simulated from 360 viewpoints uniformly sampled on a circle with the yaw angle and the pitch angle . Images in the 2D subset of BUAA-SID 1.5 were captured from 2042 viewpoints on a viewing sphere with the yaw angle and the pitch angle ; the lighting subset of BUAA-SID 1.5 contains 10800 gray images of one satellite from the 1D subset, simulated in different lighting conditions, i.e., the phase angle of the light ranging from to in steps of while the altitude angle of the light is , , and , respectively; the noise subset of BUAA-SID 1.5 was obtained by adding Gaussian white noise to images in the 1D subset, and the variance varies from 0.001 to 0.01 in steps of 0.001. Compared with BUAA-SID 1.0, BUAA-SID 1.5 has a simpler posture change but more abundant data types, such as lighting and noise subsets. So the BUAA-SID 1.5 dataset can be used for better validation of GPR for both recognition and pose estimation. It should be noticed that the data, including BUAA-SID 1.0 and BUAA-SID 1.5 datasets, used to support the findings of this study are available from the corresponding author upon request.

Several methods have been proposed in the field of multiview space object recognition (e.g., [12, 17, 19, 31]). In order to compare the performance between the methods more fairly and objectively, we use the above two datasets to do experiments. We also used two groups of image representations to represent the satellite images: the first group consists of four representations to represent the shape information of space objects, including original binary image (BI), distance transform (DT) obtained by applying a signed distance function [10] to binary images, Hu’s moment invariants (HU) [43], and Fourier descriptor (FD) [44]; the second group includes original gray image (GI) and histogram of oriented gradients (HOG) [45] which can describe the appearance variations of space objects. All these representations were used in a vector form (i.e., data in Section 2) for training and testing GPR for classification and pose estimation or the MCGPR model for pose estimation. It should be noticed that when solving a 1D or 2D pose estimation problem using the GPR model, the corresponding pose in Section 2.1 was represented as the pose angle or directly, while for the MCGPR model, was represented using equation (7) in 1D or 2D case, respectively.

3.2. Results of Satellite Recognition and Pose Estimation
3.2.1. Satellite Recognition

We performed recognition experiments on the BUAA-SID 1.0 dataset and 1D subset of BUAA-SID 1.5, respectively. As mentioned in Section 2.2, we learned one GPR model for each dimension of the output label vector, resulting 20 GPR models for satellite recognition on the BUAA-SID 1.0 dataset and 10 on the 1D subset of BUAA-SID 1.5. To compare with [12, 17, 19], half of the images were used for training, and the rest were used for testing. Specially, for comparison with [19, 31] on BUAA-SID 1.0, 80, 90, or 100, training images of each satellite were randomly selected to learn the classification models, as in [31], and the rest were used for testing. Results are shown in Tables 1, 2, and 3. The results of our GPR method are in bold if it achieves the best and in italics if it achieves the second best. For the 20-class classification problem on BUAA-SID 1.0, we can significantly improve the recognition accuracy when using 12-dimensional Hu’s moment invariants (HU) and 20-dimensional Fourier descriptors (FD), as seen in Tables 1 and 2. It is promising since lower dimension of features means lower computation and less storage. In addition, because binary images of space objects can be captured more easily and completely than high-resolution gray images, shape representations like HU and FD are more suitable for real aerospace applications. It can be seen from Table 3 that, on the 1D subset of BUAA-SID 1.5, our GPR method performs better than those in [19] and [12] in most conditions except the second best for FD representation. This shows the good recognition capability of our method.

Recently, deep learning methods are widely used for general object recognition. Zeng et al. [33] introduced such technology for space target recognition. We reproduced the nine-layer deep convolutional neural network in [33] to achieve satellite recognition on our datasets. We chosen rotation and crop as the way of data augmentation and also fixed the magnitude of training data at 8-fold. The training data is the same as above. We also trained other popular state-of-the-art deep learning networks including ResNet [46] and DenseNet [47], to make our comparison more comprehensive. The results are shown in Table 4. We can see that our GPR model gets similar results as DCNN [33] on BUAA-SID 1.0, not better than ResNet [46] and DenseNet [47]. As for the BUAA-SID 1.5 dataset, our GPR model achieves better satellite recognition accuracy than DCNN [33], the same 100% performance as ResNet [46] and DenseNet [47] on 1D the and noise subsets, and slightly worse than ResNet [46] and DenseNet [47] on the lighting subset. This means that deeper networks like ResNet [46] and DenseNet [47] can achieve better recognition results, especially when the classes to be classified enlarge, i.e., the recognition problem becomes harder. By contrast, our GPR model can be learned more easily using limited computing resources and give uncertainty of the outputting results.

3.2.2. Pose Estimation

In order to evaluate the performance of pose estimation, we performed 1D and 2D pose estimation experiments on the 1D subset and 2D subset of BUAA-SID 1.5, respectively. In each subset, we chose half of the images for training and the rest for testing. It should be noticed that on the 1D subset, we used two training strategies: in one strategy, we trained GPR/MCGPR models for each satellite individually with the assumption that the categories were known, while in the other strategy, we trained GPR/MCGPR models (indicated as GPR-ALL and MCGPR-ALL in Table 5) for all the ten satellites without prior knowledge of categories. For the quantitative evaluation and comparison, we follow the same indicators as in [19], i.e., using the absolute error (AE), which can be defined as the absolute value of the difference between the ground truth angle and the estimated pose angle. We report the mean absolute error (MAE) to evaluate the pose estimation performance on the entire testing set. At the same time, in order to analyze the distribution of angle errors in detail, we also report the percentage of testing images of which the pose angle is correctly estimated with an AE less than a threshold (1°, 2°, or 5°). Since most recent vision-based satellite pose estimation methods [27, 29] are not learning-based and not suitable for comparison, we compared our proposed method with the most comparative kernel regression method [19]. In addition, for comprehensive comparison with recently available learning-based methods, we also adjusted the network structures of ResNet [46] and DenseNet [47] and trained them in a regression way to estimate 1D and 2D pose on BUAA-SID 1.5.

Experimental results are shown in Tables 5 and 6. It can be seen that the proposed method performs better than or similar as the state-of-the-art [19] in both 1D and 2D pose estimations. Deep learning networks ResNet [46] and DenseNet [47] perform relatively worse for pose estimation. This may be caused by the limited training data and the highly synthesized deep features, resulting terrible regression performance on the BUAA-SID 1.5 dataset. Comparing with the original GPR, MCGPR achieves significant improvement, surprisingly improving about an order of magnitude in MAE of 1D pose estimation using GI representation. This validates the role of the manifold constraint. We can see that GI performs the best among all the six kinds of image representations, as it is more sensitive to pose variation. HU and FD seem to be unsuccessful in pose estimation, with quite high MAE on both the 1D and 2D subsets. This may be because they are view-invariant features and not sensitive to pose variation, and such invariance reduces pose estimation performance. Thus, to get better pose estimation results, we need to use pose-sensitive representations. Comparing results in Tables 5 and 6, it can be obviously seen that 2D results are worse than 1D results, which can be easily understood by considering the increasing difficulty of the pose estimation problem when the dimensionality rises from 1D to 2D. Similarly, due to the growing variance within training data, GPR-ALL/MCGPR-ALL performs worse than GPR/MCGPR. Therefore, pose estimation models should better be trained for individual targets.

3.2.3. Joint Satellite Recognition and Pose Estimation

According to the findings in Section 3.2.2, previous recognition approach can provide category prior to selecting proper models for better pose estimation. Thus, we use pose-after-recognition strategy to achieve joint satellite recognition and pose estimation based on individual approaches introduced in Sections 2.2 and 2.3 of this paper; i.e., we recognize the category of a testing data using GPR and then estimate its pose using MCGPR specially trained for the recognized category. Table 7 shows joint satellite recognition and pose estimation results of our proposed method and the comparison with the state-of-the-art networks [12, 19]. We perform the best pose estimation results for five out of all the six kinds of image representations, except the second best for BI representation. Noticing the good recognition accuracy shown in Table 3, our proposed method achieves the state-of-the-art results for joint satellite recognition and pose estimation.

3.3. Uncertainty Analysis

The most important advantage of the GPR method than previous works [12, 17, 19, 31] is that GPR can provide variance of each predicted output value, which can be used to calculate the uncertainty of the predicted value as introduced in Section 2.4. Such uncertainty can describe the credibility of the predicted value and give us a way to choose convincing results. This is very helpful in aerospace applications.

3.3.1. Uncertainty of Satellite Recognition

We chose the satellite recognition results using FD representation in Table 3 to analyze the uncertainties of predicted class labels, since the recognition accuracy of FD is less than 100%, meaning that some of testing images were incorrectly classified. We computed the Euclidean distance between the predicted label vector and the ground truth label vector to quantitatively evaluate the error of recognition. Figure 1 shows the relationship between uncertainty and the Euclidean distance. It can be seen that predicted labels with large Euclidean distances, i.e., large recognition errors, correspond to the large uncertainties. According to the data in Figure 1, if we select an uncertainty threshold and regard the recognition results of the uncertainties more than as untrustworthy results, we can get the recognition accuracy of the rest results raised to 100%. It means that we can choose more convincing recognition results according to uncertainties.

3.3.2. Uncertainty of Pose Estimation

To validate the capability of pose estimation uncertainty provided by our method, the performance of the representation GI on the pose estimation problem, we used 1D pose estimation result GI representation in Table 5 to analyze the absolute errors and uncertainties of the estimated 1D pose angles. Figure 2 shows the results of three satellites. We can see that both GPR and MCGPR have one thing in common that estimated poses with large uncertainties usually have high absolute errors, resulting in obvious peaks at the same positions on the abscissa in the figure. This can verify the feasibility of the definition of pose uncertainty in Section 2.4. We can also see that the mean absolute error of MCGPR is , quite less than of GPR in Figure 2. However, the mean uncertainty of MCGPR is , larger than of GPR. This means that MCGPR improves the pose estimation performance of GPR at the expense of enlarging the uncertainties of some predicted values. Such enlarging can be explained by equation (2) where the uncertainty of MCGPR is an additive combination of uncertainties of individual GPRs. It should be noticed that the spikes in Figure 2 appear when the appearances of the satellite of symmetrical poses (the difference between pose angles is near ) are quite similar since the satellite is usually of geometrical symmetry. In this case, the errors and uncertainties will tend to peak.

Similar results can also be found in Figure 3 for the uncertainty of 2D pose. We can see that the positive correlations between the absolute error and the uncertainty are pronounced in the overall trend but locally various. This may be caused by the more challenging difficulty in 2D pose estimation than the 1D case.

3.4. Noise Robustness

Noise in the data is an important factor affecting the performance of machine learning models. In this section, we analyzed the satellite recognition and pose estimation performance of the proposed models trained by noise and noise-free data.

3.4.1. Noise Robustness of Satellite Recognition

To evaluate the robustness against noise for satellite recognition, we used the models trained on the training set of the 1D subset of BUAA-SID 1.5 in Section 3.2.1 and tested on the noisy images in the noise subset of BUAA-SID 1.5 corresponding to the testing set of the 1D subset. In other words, we used noisy data to test the model learned by noise-free data. Figure 4(a) shows the experimental results. It can be seen that noise significantly affects recognition performance except GI. BI and DT perform better than HU, FD, and HOG. The noise robustness of GI is particularly prominent because its recognition accuracy remains 100% when the variance of Gaussian noise increases. Figure 4(b) shows experimental results of GPR models trained on noisy training data, i.e., the noisy images in the noise subset of BUAA-SID 1.5 corresponding to the training set of the 1D subset. The performance of noise GPR models is improved obviously than the noise-free models in Figure 4(a). This means that GPR can model the noise in the training data and get promising results on realistic noisy data. Therefore, it is necessary to model the noise in the learning procedure for real applications.

3.4.2. Noise Robustness of Pose Estimation

We also experimented on the noise subset of BUAA-SID 1.5 to evaluate the noise robustness of pose estimation. We compared the performance of models learned in Section 3.2.2 using noise-free training data (indicated as “GPR/MCGPR+image representation”) and the corresponding models retrained by noisy training data (indicated as “NOISE+image representation”) in Figures 5, 6, and 7. It is shown that noise significantly affects pose estimation performance as well. Results get worse rapidly when the variance of noise increases, and GI has better noise robustness than other image representations. By modeling the noise in the learning procedure, most GPR/MCGPR models get improved pose estimation performance, including the lower mean absolute error and higher percentage accuracy with AE less than 1°, 2°, or 5°. MCGPR with GI representation is an exception. We explain that, since the manifold constraint in MCGPR has enhanced the pose estimation ability of GPR and GI has been proved as the image representation with best noise robustness, the noise in the training data may disturb the modeling of MCGPR for accurate pose estimation and thus reduce the pose estimation performance.

3.5. Lighting Robustness

Lighting is another important factor affecting satellite recognition and pose estimation. Lighting condition will change along with the lighting phase angle (also called the sun phase angle), which is determined by the relative position between the sun, the imaging sensor, and the target satellite. Such changes may affect appearance representations, e.g., GI and HOG. Thus, we analyzed lighting robustness on the lighting subset of BUAA-SID 1.5 using the model learned on training data of the 1D subset BUAA-SID 1.5 in Sections 3.2.1 and 3.2.2, respectively.

3.5.1. Lighting Robustness of Satellite Recognition

Figure 8 shows the lighting robustness results of GPR with GI and HOG. The comparison with the kernel method [19] shows that our method has a better lighting robustness of satellite recognition than the kernel method [19] when the lighting phase angle is larger than 60°. It can also be seen that HOG is more robust than GI.

3.5.2. Lighting Robustness of Pose Estimation

Experimental results for pose estimation are shown in Figure 9. We can also see that HOG is more robust than GI for pose estimation, and MCGPR with HOG representation can still achieve MAE less than 10° even in bad lighting condition; i.e., the lighting phase angle is larger than 60°.

3.6. Performance on Sparse Training Data

Actually, it is difficult to get a large quantity of images for training due to the limitation of real imaging systems in space. If we can achieve a good performance by using fewer training images, it will be more helpful and applicative in practice. So it is important and necessary to clarify how many training images our proposed method actually need. Thus, in this section, we kept the testing set and the parameter of the model learned on the training data of the 1D subset of BUAA-SID 1.5 in Sections 3.2.1 and 3.2.2 and just reduced the number of training images for each satellite from 180 to 10 in order to analyze the satellite recognition and pose estimation performance on sparse training data.

3.6.1. Satellite Recognition Performance on Sparse Training Data

Figure 10 shows the results for satellite recognition. It can be seen that most representations can still perform 100% recognition accuracy given only 10 training images. The worse performance of HU and FD is around 90% in this case, which is also acceptable. This shows the strong ability of GPR for satellite recognition in terms of sparse training data.

3.6.2. Pose Estimation Performance on Sparse Training Data

For pose estimation, we selected one satellite in the 1D subset of BUAA-SID 1.5 and chose GI as the image representation. Results are shown in Figure 11. We can see that performance changes slowly when the number of training images is more than 100, while the average uncertainty keeps falling although the speed becomes slower and slower. Thus, more training data may help to minimize the uncertainty, but pose estimation results may not be improved too much. Comparing with satellite recognition, pose estimation is more sensitive about the number of training images. Taking all factors into consideration, we suggest at least 50 training images to get acceptable pose estimation results and more than 100 necessary images for better results.

3.7. Cross-Validation for Parameter Determination

There are two parameters affecting the performance of our proposed method, i.e., the characteristic length-scale of the covariance function in equation (2) and the variance of the additive noise in the regression model (also in the likelihood function ). Since we use the squared exponential function as the covariance function in our model, we empirically set the length scale by experience as where is the second norm in .

We did 4-fold cross-validation on the training set to select the most suitable . We divided the training set to 4-fold and used onefold as the validation set in each round of the 4-round validation experiments. We use the average performance of recognition accuracy or MAE to select proper for satellite recognition or pose estimation. Some cross-validation results for pose estimation on the 1D subset of BUAA-SID 1.5 are shown in Figure 12. Similar validation experiments were done on other subsets as well. The parameters achieving reported experimental results in this paper are listed in Tables 8 and 9.

4. Conclusion

In this paper, we have proposed a novel monocular vision-based method by employing Gaussian process regression to solve satellite recognition and pose estimation. Our approach can effectively recognize the categories of satellites and estimate their relative poses. We have achieved the state-of-the-art satellite recognition and pose estimation results on BUAA-SID datasets. For satellite recognition, only 10 training images of each satellite are needed to get near 100% recognition accuracy. For pose estimation, our method can obtain a pose error less than 5 degrees for most cases. In addition, no other requirements but training data are needed for our pose estimation model; thus, there are no aforementioned limitations, such as camera calibration and optical markers. Our GPR-based method can provide the uncertainty of the predicted values (pose angles or categories), which may be used to choose convincing results in applications. Because of the supervised learning procedure, our method may be more suitable for cooperative space objects of which enough images can be obtained for training.

Data Availability

The data, including BUAA-SID 1.0 and 1.5 datasets, used to support the findings of this study are available from the corresponding author upon request.

Disclosure

The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; and in the decision to publish the results.

Conflicts of Interest

The authors declare no conflict of interest.

Authors’ Contributions

H.Z., C.Z., G.M., and Z.J. conceived and designed the experiments. H.Z. and Y.Y. performed the experiments for pose estimation. C.Z. performed the experiments for satellite recognition. H.Z., C.Z., G.M., and Y.Y. analyzed the data. H.Z. and C.Z. wrote the paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61501009, 61771031, and 61371134), the National Key Research and Development Program of China (Grant Nos. 2016YFB0501300 and 2016YFB0501302), and the Fundamental Research Funds for the Central Universities.