Abstract
Many objects in the real world have circular feature. In general, circular feature’s pose is represented by 5-DoF (degree of freedom) vector . It is a difficult task to measure the accuracy of circular feature’s pose in each direction and the correlation between each direction. This paper proposes a closed-form solution for estimating the accuracy of pose transformation of circular feature. The covariance matrix of is used to measure the accuracy of the pose. The relationship between the pose of the circular feature of 3D object and the 2D points is analyzed to yield an implicit function, and then Gauss–Newton theorem is employed to compute the partial derivatives of the function with respect to such point, and after that the covariance matrix is computed from both the 2D points and the extraction error. In addition, the method utilizes the covariance matrix of 5-DoF circular feature’s pose variables to optimize the pose estimator. Based on pose covariance, minimize the mean square error (Min-MSE) metric is introduced to guide good 2D imaging point selection, and the total amount of noise introduced into the pose estimator can be reduced. This work provides an accuracy method for object 2D-3D pose estimation using circular feature. At last, the effectiveness of the method for estimating the accuracy is validated based on both random data sets and synthetic images. Various synthetic image sequences are illustrated to show the performance and advantages of the proposed pose optimization method for estimating circular feature’s pose.
1. Introduction
Pose estimation is an essential step in many machine vision and photogrammetric applications, and the ultimate goal of pose estimation is to identify 3D pose of an object of interest from an image or image sequence [1, 2]. The existing algorithms detect elliptic from 2D image, and the 3D pose of the circular can be extracted from single image using the inverse projection model of the calibrated camera (see Figure 1(b)) [3–5]. These methods are successfully applied to pose estimation of underwater dock [6] and pose estimation of the bait [7]. For the application of the monocular vision pose estimation system using circular feature in industrial, quite a few works on accuracy are presented; however, industrial application needs very high requirement to precision. The process of pose estimation is so complicated that it is difficult to observe the effect of the each parameter error on pose error and measure the accuracy of the circular feature’s pose. To overcome this, in the paper, the covariance matrix of the pose vector is used to measure the accuracy of the circular feature’s pose, and the paper proposes a closed-form solution for estimating the accuracy of circular feature’s pose. However, before doing so, let us revisit several contributions related to the proposed method.

(a)

(b)
The accuracy of the monocular vision circular pose estimation system is related to many factors such as the detection error of image coordinate, the scale error of object model, and the camera calibration error. To evaluate error of elements in 3D pose estimation result, at the same time, the effects from related factors to 3D pose estimation result are analyzed and two approaches have been developed. A common approach was implemented for error analysis of 3D pose estimation result with the ideal simulation model of the monocular vision pose estimation system. It does not derive the error propagation equations for the pose estimation model. An ideal monocular vision pose estimation system was adopted for experiment, and the various parameters of the monocular vision system were introduced into the pose estimation model, through which the main parameters that influence the error of pose estimation were concluded. The most effective accuracy optimization method was found in [1], [4, 8–13]. In [11], an error analysis method considering multiple error factors was proposed, and the camera internal error, lens distortion error, image point’s detection error, and object error were introduced into the pose estimation model simultaneously based on actual error levels. The influence of suppressing parameter errors improving camera resolution on the pose estimation results was obtained. Above error analysis is based on the sampled model of 3D pose estimation, but this analysis just provides a rough qualitative result from the theory; for the quantitative result of error estimation, one needs to compute the covariance matrix of unknown variables.
Another approach based on a known model in the real project is that the error of the pose can be expressed by the error of the various parameters of the monocular vision system, and then one gets covariance of pose variables for the quantitative result of error estimation [14–24]. In [14–18], the simplified 3D pose estimation model is constructed instead of so complicated pose estimation model for special real project of the vision measure system, and the relation between the error of various input parameters and pose measurement error has been deduced by the first derivative of pose on each input parameter. In [21], error propagation rules in 3D reconstruction process have deduced on the base of the matrix analysis method, and a calculation method of the covariance matrix has put forward to evaluate error of elements in 3D reconstruction result. Through this work, one can conclude which parameter mainly influences the accuracy of the pose estimation so as to provide some reference for choosing different optimization technique for improving the accuracy of pose estimation.
Although the conclusion of these approaches provides important guidance for engineering application of the visual pose estimation system, the drawback of above methods is that the covariance is computed by setting multigroup actual pose parameters statistically. So, these works incur increased timing. In [23, 24], the covariance matrix of pose variables is used to measure the accuracy of relative pose, and a closed-form solution for estimating the accuracy of pose transformation is proposed. An implicit function from the three dimensional point pairs to the pose variable is defined, and then implicit theorem is employed to compute the partial derivatives of function with respect to such point pairs, and after the covariance matrix is computed from both the changing trend of implicit function and the measuring error of the camera. The proposed method provides fast numerical solutions for covariance of pose variables. Above methods rely on the direct relationship between image points and pose parameters to construct the 3D model of pose estimation for estimating the covariance matrix of pose variables. In the process of 2D-3D pose estimation using circular feature, the least square model and many matrix transformations are used and it is much complex. For measuring the accuracy of circular feature’s pose variables in each direction and the correlation between each direction, it is a challenging task to have representations for the covariance matrix of 5D circular feature’s pose variables with good and fast numerical solution by building the relationship directly from the image points to the 5D pose variables.
In this work, the proposed algorithm measures the accuracy of the circular feature’s pose variables using the covariance matrix of pose variables. We propose a closed-form solution for estimating the covariance matrix of 5D circular feature’s pose variables by exploiting 2D imaging point’s coordinates and its extraction error. The main contributions of the present work can be summarized as follows:(1)The paper proposes a closed-form solution for estimating the accuracy of 5D circular feature’s pose variables using 2D imaging point’s coordinate and its extraction error covariance matrix.(2)The projection of the circular feature of 3D object yields a 2D virtual elliptic contour, and algebraic distance from 2D imaging point to the elliptic contour is employed to yield an implicit function from the image points to 5D pose variables as defined, and then Gauss–Newton theorem is employed to compute the partial derivatives of the function with respect to such point. For this, one can obtain error propagation rules in 3D pose estimation process and the relation between the extraction error of 2D imaging point’s coordinate and circular feature’s pose variable error. These works are used to compute the covariance matrix of circular feature’s pose variables.(3)The covariance of 2D imaging point’s extraction error due to quantization can be computed by the size of quantization unit [19]. The error propagation rules are combined with the 2D imaging point’s coordinate, and its quantization error covariance computes the covariance matrix of circular feature’s pose variables. The proposed algorithm provides a closed-form solution for estimating the covariance matrix of circular feature’s pose variables instead of Monte Carlo simulation.(4)The covariance matrix of 5D circular feature’s pose variables is employed to the object 2D-3D pose estimation system using circular feature. The proposed optimal algorithm yields high accuracy of object 2D-3D pose estimation, and low location error values are obtained when processing a sequence of 2D monocular images degraded with additive noise.
The paper is organized as follows. In Section 2, we define the representation method of the 5D circle pose parameters and then give the representation of the projection of the 3D object circular feature using the 5D circular pose variables. Section 3 explains the proposed closed-form solution for estimating the accuracy of 5D pose variables for 2D-3D object pose estimation systems using circular feature. Specifically, we discuss the definition method of an implicit function from the image points to 5D pose variables and the representations for the covariance matrix of 5D circular feature’s pose variables with good and fast numerical solution by building the relationship directly from the image points to the 5D pose variables. In Section 4, we briefly explain the proposed pose optimization algorithm for object 2D-3D pose estimation using circular feature and the covariance matrix of 5D circular feature’s pose for selecting good 2D imaging point for object 2D-3D pose optimization. Section 5 presents experimental results obtained with the proposed closed-form solution and the proposed pose optimization algorithm when processing synthetic image, which are discussed and compared with the obtained by Monte Carlo simulation and existing object 2D-3D pose estimation using circular feature. The conclusions of the present work are summarized in Section 6.
2. Preliminaries
2.1. The Position and Orientation of a Circular Feature in 3D
As shown in Figure 1(a), we have shown two coordinate frames. The camera frame is a 3D frame with the origin as the projection center and has its pointing to direction it is pointed. The image frame is a 2D frame with and axes parallel to and of the camera frame, respectively. The projection of circular feature with a radius of into the image plane is represented as elliptic .
As shown in Figure 1(b), the position and orientation of a circular feature in 3D are completely specified by the coordinates of its center and the direction angle of the surface normal vector . We will adopt a convention that points the surfaces normal from the circular towards the direction where the circular is visible [25]. Examples are shown in Figure 1(b). The direction angle indicates the angle between the projection of into the plane and the axis . The positive angle is defined as the counter clockwise rotation of . The direction angle indicates the angle between and axis. The positive angle is defined as the counter clockwise rotation of . In addition, and . Therefore, the position and orientation of a circular feature in 3D can be expressed as , where the coordinates of the center are represented as [26].
2.2. The Representation of the Projection of the 3D Object Circular Feature Using
To interact with 2D imaging points, the circular feature in 3D has to be projected to the image plane and then the result yields a 2D virtual elliptic contour . In the section, we shall give the representation of the projection of the 3D object circular feature using the circular feature’s pose variables for constructing an implicit function from 2D imaging point to 5D pose variables.
For each pose configuration , one can derive as follows: let and denote the center coordinate and surface normal vector of the circular feature in 3D with radius . Equation for projection of the circular feature in 3D into the image plane to yield the 2D ellipse curve is as follows:where the parameter is represented by the location and 3D orientation and we need to distinguish it from reference [25]. See Appendix A for detailed derivation. Moreover, the parameters can be denoted as , and we represent this 2D virtual elliptic contour by its normalized coefficients: .where is the focal length of the camera.
3. Method Description
In this section, we present the closed-form solution for computing the covariance matrix of 5D circular feature’s pose variable using 2D imaging points and its extraction error covariance under the implicit function. We shall discuss the definition method of implicit function by algebraic distance from 2D imaging point to the elliptic contour. Later, we shall incorporate such implicit function with the Gauss–Newton theorem to obtain error propagation rules in 3D pose estimation process and the relation between the extraction error of 2D imaging point’s coordinate and circular feature’s pose variable error. For this, a closed form for estimating the covariance matrix of circular feature’s pose variables can be obtained.
3.1. Minimization Algebraic Distance to Yield an Implicit Function
To interact with the 2D imaging points, we give the representation of the projection of the 3D object circular feature using and then these results yield an implicit function from all 2D imaging points to circular pose variables. Moreover, the implicit function is assumed to be constructed by the algebraic distance .
For each pose configuration , the algebraic distance is then defined as the points-to-contour distance between detected the 2D imaging point and the 2D virtual elliptic contour , which can be denoted as follows:where , indicates the set of all detected 2D point on the image plane that correspond to a 3D point on the circular feature in 3D. By knowing that the circular feature’s pose parameters can be obtained by minimizing the square sum of all detected 2D imaging point’s algebraic distance, which is defined as follows:where and indicates the stack matrices of 2D imaging points , as expressed in the following equation:
For convenient, we denoted 。
The least squares objective of circular feature’s pose optimization can be written as follows:where is a combination of 2D imaging points and pin-hole projection. The Levenberg–Marquardt method is used for optimizing circular feature’s pose variables [27].
3.2. Computing Covariance Matrix of Circular Feature’s Pose Variables through Implicit Function
In this subsection, we describe the representations for the covariance matrix of 5D circular feature’s pose variables with good and fast numerical solution by using 2D imaging points and its extraction error covariance under the nonlinear implicit function. Here, Gauss–Newton theorem is employed to compute the partial derivatives of the function with respect to such point, and then one can derive the covariance matrix as follows.
Solving the least squares objective of equation (6) needs the first-order approximation to the nonlinear function linearization about initial pose :where is a block matrix of size , is the number of all detected 2D imaging points; for simplicity, it can be expressed as .
To minimize of the first-order approximation equation (7) via Gauss–Newton, is used to yield an analytic expression of pose optimization error , which can be denoted as follows:where is the pseudo inverse of Jacobian matrix . The pose estimation error is affected by 2D imaging points extraction error. Again with the first-order approximation of at the initial pose and 2D imaging point’s coordinate , we can connect the pose estimation error to 2D imaging point’s extraction errors:where is a block diagonal matrix of size ; for simplicity, it can be expressed as .
According to equation (9), error propagation rules from 2D imaging point’s coordinate extraction error to circular feature’s pose variable error can be obtained, which is denoted as follows:
Consider the case that only quantization error exists and is i.i.d Gaussian with isotropic: , we can derive the pose covariance matrix:
Still all 2D imaging points being detected should be utilized, and we can compute the pose covariance matrix from 2D imaging point’s coordinate and its extraction error variance. The variance of 2D imaging point’s quantization error can be computed by the size of quantization unit in Section 5.
4. Covariance for 2D-3D Object Pose Estimation Using Circular Feature
In 2D-3D object pose estimation system using circular feature, 2D imaging points extraction error negatively affects pose optimization result. Regarding accurate pose optimization, not all 2D imaging points being detected contribute the same. If only those valuable towards accurate pose estimation are utilized, the total amount of noise introduced into the 2D-3D object pose estimation system using circular feature can be reduced, and this work improves the accuracy of pose estimation result. Here, minimize the mean square error (Min-MSE) metric is used to select 2D imaging points out of candidates, and then the Max-Trace metric is introduced to guide 2D imaging points selection. 2D imaging points selection occurs prior to Levenberg–Marquardt pose optimization so that only a subset of selected features is sent to the optimizer. First, according to Min-MSE criteria, we shall give expression to evaluate a random subset of candidates to identify the current “best” 2D imaging point. Later, we shall briefly describe an algorithm for approximately solving the NP-hard Max-Trace problem.
4.1. Good 2D Imaging Points Selection Metrics
Analyzing the impact of 2D imaging points extraction error on pose optimization led to equations where the matrix trace of was connected to best case outcome. It well known that the expectation of the quadratic form is equal to the matrix trace, and it can be expressed as follows: E(ωT Ψω) = Tr(Σω Ψ) with ω ∼ N(0, Σω) According to this work and pose error propagation rules equation (10), can be denoted as follows:where equation (12) gives the relationship among the 2D imaging point’s coordinate , its extraction error variance, and the criteria. Good 2D imaging point’s selection metrics can be expressed as follows:
4.2. Efficient Subset Selection
By knowing that subset selection with the metrics is equivalent to a finite combinational optimization problem, it can be denoted as follows:where is the indices of selected row blocks from full matrix . Also, it is the indices of selected 2D imaging point. Moreover, in equation (14), can be rewritten as follows:
Note that the Jacobian is a block diagonal matrix, consisting of denoted by . Meanwhile, each row block of can be written as .
To remove the need for the pseudo inverse of , add one more row to each block , where . In addition, a zero row is added to each row block to get new row block . This trick does not affect the structure of the least square problem, but it does allow inversion of the new block . After performing block-wise multiplication, one can obtain the combined matrix , consisting of concatenated row blocks . Instead of working with two independent matrices and , we consider optimizing their combination .
For this, good 2D imaging point’s selection problem is equivalent to selecting a subset of row blocks in the matrix , and then equation (15) can be rewritten as follows:where the combinational optimization above can be solved with brute-force, but the exponentially growing problem space quickly becomes impractical to search, especially for real-time application. Subset selection with metric has been studied in fields such as feature selection [28]. Here, the stochastic-greedy-based 2D imaging point’s selection algorithm is commonly used to approximate the original NP-hard combinational optimization problem. 2D imaging point’s selection is done in three steps:(1)Compute the full 2D imaging points Jacobian and Jacobian ;(2)Combine the two into ;(3)Greedily select row blocks of , based on the matrix-revealing metric , until reaching the target subset size. Algorithm 1 summarizes the stochastic-greedy-based 2D imaging point’s selection algorithm [29].
|
5. Experiment Results and Discussion
The results obtained with the proposed algorithm are presented and discussed in this section. First, the effectiveness of the method for estimating the accuracy is validated based on random simulation and synthetic data sets. Later, integrating Max-Trace 2D imaging point’s selection into a 2D-3D object pose estimation system using circular feature leads to accuracy improvements with low overhead, as demonstrated via evaluation on synthetic data sets.
5.1. A Closed-Form for Estimating Covariance vs. Monte Carlo Simulation
The effectiveness of a closed-form solution for estimating covariance is demonstrated by comparing against the Monte Carlo simulation method. The Monte Carlo simulation method is employed to compute the actual covariance value based on random data sets and synthetic data sets and then compares actual value with estimation value.
The simulation environment is depicted as follows.
Here, we produced a virtual camera based on the perspective projection model. The image size was pixels, and the effective focal length was , each circular feature with the radius . After applying a small random pose transform to 3D object with circular feature, a fixed number (e.g., 180 in this simulation test) of the 2D imaging points of circular feature are generated using the 2D virtual elliptic contour model . To simulate extraction error, the 2D imaging points are perturbed with variety of measurement error: zero-mean Gaussian with standard deviation of 0.25, 0.50, and 0.75 pixel. Figure 2 illustrates examples of noisy random data for different values of .

(a)

(b)

(c)
Note that these random data are generated by MATLAB pseudo-random data generator. The mean and covariance of 2D imaging points coordinate extraction error can be denoted as follows:where the extraction error of coordinate x and y is i.i.d. Gaussian: N(0, ), N(0, ) To be statistically sound, 500 runs are repeated for each 2D imaging point, and one can obtain 500 numbers of 2D imaging point set. For this, 500 numbers of pose estimation value can be obtained by equation (6), and then the actual pose covariance matrix can be computed by using the Monte Carlo simulation method.
Another, one set is taken randomly from 500 numbers of 2D imaging point sets, and pose covariance estimation value can be obtained using the proposed a closed-form solution. Figures 3–5 present the projection of 5D ellipsoid corresponding to estimate the pose covariance matrix and actual pose covariance matrix into 2D plane, respectively, and the confidence in the ellipse is 95%. The simulation environment sets , and it indicates the center of ellipse. Figures 3–5 present the comparisons of real and estimated covariance between translations and rotations.

(a)

(b)

(c)

(a)

(b)

(c)

In Figures 3–5, the estimated pose covariance with the proposed algorithm is indicated with ‘closed’; the actual pose covariance obtained with Monte Carlo is indicated with ‘Monto.’ We can see that after processing random data with variety of noisy conditions, the proposed algorithm is able to estimate the pose covariance with good accuracy, and the results are closed to actual pose covariance even in highly noisy conditions. These results demonstrate that the proposed algorithm can estimate the covariance matrix of 5D circular feature’s pose variables with good and fast numerical solution.
Estimating the covariance matrix of circular feature’s pose variables requires the extraction error of each 2D imaging point. Here, we use the mathematical tools in [19] for the computation of the average value and the standard deviation of the extraction error due to quantization. The maximum quantization error in the detected 2D imaging points coordinates being quantized is half the size of the quantization unit, and in spatial quantization, the maximum error is half a pixel, . That is to say, the actual value of the detected 2D imaging points coordinates can be anywhere in the interval and , where is the value of after quantization. Furthermore, the likelihood of the actual value of , i.e., , being at a certain place within the interval follows a uniform probability density. In other words, the probability of lying inside the small interval around the point is independent of and equal to . For example, the average value of the quantization error in is , and the average of the absolute value error in is , and the standard deviation of the quantization error in is . So far, the extraction error of each 2D imaging point has been calculated.
To identify the estimated pose covariance that could evaluate the accuracy of circular feature’s pose variables, a synthetic data simulation is carried out. The synthetic image is rendered using 3 ds Max software, each frame of an input image is composed of 3D object with circular feature that follows an unknown pose, it is embedded into a disjoint background, and the whole frame is degraded with small additive noise. Each test image is monochrome image with pixels, and the effective focal length was , each circular feature with the radius , and the appearance of the object is dynamically modified by changing their orientation angles and location coordinates. By using equation (6), 500 numbers of iterations are performed for the test image, and we estimate the pose covariance matrix for each iteration by using equation (11). Figure 6 presents the projection of 5D ellipsoid corresponding to the estimation pose covariance matrix, and the confidence in the ellipse is 95%. In Figure 6, the estimated pose covariance with the proposed algorithm corresponding to each iteration is indicated with ‘iteration NO.’ We can see that the area of the ellipse decreases gradually as the number of iterations increases. The result indicate that the accuracy of pose estimation is improving, and estimated pose variable values are closed to actual pose .

(a)

(b)

(c)
Figure 7 presents the estimated pose value distribution curve in each pose variable. In Figure 7, the estimated pose value distribution curve corresponding to each iteration is indicated with ‘iteration NO.’ We can see that the peak value of the each pose variable distribution curve is closed to the actual value gradually as the number of iterations increases, and the distribution is thin and higher. The trend is similar to the trend of Figure 6, and the result indicates the closed-form solution for estimating the covariance matrix of 5D circular feature’s pose variables with good and fast numerical solution.

(a)

(b)
5.2. Good Imaging Points Selection for Circular Pose Estimation vs. Other Method
The results obtained with the proposed algorithm for 3D pose estimation of a moving object with circular feature from monocular scenes are presented and discussed in this section. These results are characterized in terms of accuracy of pose estimation when processing synthetic image sequences. All these synthetic image sequences are rendered using 3 ds Max software, each frame of an input sequence is composed of 3D object with circular feature that follows an unknown pose trajectory, it is embedded into a disjoint background, and the whole frame is degraded with additive noise. Each test sequence is composed by scene frames consisting of monochrome image, and the appearance of the object during scene frames is dynamically modified by changing their orientation angles and location coordinates.
The experiments are organized as follows. First, we explore the effectiveness of the variety of matrix-revealing metrics Max-Trace, Max-logDet [30], and Max-MinEigenValue [31] for guiding good 2D imaging point selection. Second, a performance comparison of the proposed algorithm with respect to existing algorithms is presented and discussed by processing synthetic image sequences.
The accuracy of location estimation of the object is determined by the location root mean square error () given bywhere and are the true and estimated coordinates of the object in the scene, respectively, given in millimeters. Moreover, the orientation root mean square error () is given bywhere and are the true and estimated the surface normal vector of the object with respect of the observer, respectively, given in millimeters.
The performance of the tested algorithms is quantified in terms of percentages of normalized absolute errors (NAEs), between the real and estimated pose parameters as follows [32]:
The percentages of normalized absolute errors of location estimation of object are denoted by , and the percentages of normalized absolute errors of orientation estimation of the object are denoted by ; both are computed with equation (20).
Feature selection has been extensively studied in the fields of VSLAM [28, 31], for which several equivalent metrics exist to score the subset selection process. They are listed in Table 1. To identify the matrix-revealing metrics Max-Trace can guide good 2D imaging point selection, a simulation of least square pose optimization was carried out by processing synthetic image sequences. The evaluation scenario is depicted in Figure 8. Here, three matrix-revealing metrics in Table 1 are employed to select 2D imaging points set, and then the selected good imaging points are imputed the pose optimization system.

(a)

(b)

(c)
The test sequence is composed by 90 scene frames consisting of monochrome image with pixels, and the effective focal length was , each circular feature with the radius , and the appearance of the object during scene frames is dynamically modified by a small random pose transform to 3D object. The 2D projection of 3D circular feature is detected by processing sequences of synthetic images corrupted with zero-mean additive Gaussian noise with the variance and perfectly matched with known 3D circular feature, and the number of the detected 2D imaging points in each image exceeds 90. An optimizer equation (6) estimates the random pose from the matches. Subset size ranging from 40 to 85 is tested.
2D imaging point selection occurs prior to pose optimization so that only a subset of selected imaging point is sent to the optimizer. Each of the matrix-revealing metrics listed in Table 1 is tested. The simulation results are presented in Figure 9. For reference, we also plot the simulation results with subset selection (outlier rejection) and with all features available (All).

(a)

(b)
From Figure 9, the Max-Trace metric has the best overall performance. It more quickly approaches the baseline error (All). The orientation error goes higher the All baseline, while the locational error is significantly lower the baseline, once the subset size exceeds 55. The results point to the value of Max-Trace good features selection.
The accuracy improvement of good imaging point selection using Max-Trace is further demonstrated by comparing against other existing similar algorithms for circular feature’s pose estimation using monocular images. The performance of the tested algorithms is quantified in terms of root mean square error , between the real and estimated location parameter as in equation (18). We compared the following methods:(1)The algorithm in [25]: The ellipse detection [16] is also the key issue of approaches that use the 2D ellipse parameters to solve circular pose estimation problem. The 2D ellipse parameters were fitted using the least squares method.(2)The algorithm in [1]: The external feature is given, such as another circular, new points, or lines. The 2D ellipse parameters were given by the ellipse detection to solve initial estimated solution. A general frame to fuse circulars and points including all situations such as one circle one point, two or more circles, and other situations is addressed to solve the duality problem in particular cases. And then, a novel unified reprojection error for circles and points is defined to determine the optimal pose solution.(3)The algorithm : The input 2D imaging point set contains some outliers, which affect the performance of pose estimation when included. Due to the lack of explicit outlier rejection in , we add an outlier rejection module to the pose optimization. The ∼ distribution can be obtained from the distance point-to-contour between 2D imaging point and the detected 2D ellipse. The imaging point with is preserved and is sent to pose estimation system . Such an implementation of outlier rejection is far from efficient, but it will kick out most of the outliers.(4)The algorithm : The proposed Max-Trace algorithm is conducted on the whole set of input 2D imaging points directly, and then the number of 2D imaging points used is fixed (75 points per frame). The selected 2D imaging points are sent to the optimizer equation (6). Levenberg–Marquardt is used for optimizing circular feature’s pose.(5)The algorithm : A candidate pool of all 2D imaging points is selected using , from which the good imaging point subset is further extracted based on the proposed Max-Trace algorithm. As many as 75 imaging points can be used to optimize the pose per frame.(6)The algorithm : The algorithm presents a method for object 3D pose estimation by processing sequences of monocular images of the dynamics target. The algorithm is based on a hybrid methodology combining the good 2D imaging points selecting and unscented Kalman filter [33, 34] for dynamically adaptive processing. The good imaging point subset is extracted using and the proposed Max-Trace; these imaging points can be used to optimize the observed for object. See Appendix B for detailed of the objective function and Algorithm 2 summarizes the UKF-based 2D-3D object pose estimation algorithm.
A common character of these mentioned algorithms is that the object 3D pose is obtained from a single monocular image. The means of the of the five algorithms are shown in Table 2, which are marked as . For each approach type, the lowest percentages of normalized absolute errors per sequence are in bold. On almost all sequences, either or has the lowest . On synthetics sequence such as Seq.1, Seq.2, Seq.3, Seq.4, and Seq.5, the proposed Max-Trace algorithm reduces the location error significantly. The exception is Seq.6 where the proposed Max-Trace algorithm results in a higher than the lowest one (generated by ). Overall, the proposed Max-Trace approach reduces pose estimation error on several sequences by exploiting the structural and motion information. Integrating Max-Trace with outlier rejection (i.e.,) further improves performance.
Now, compare the algorithm with the two algorithms and . On synthetics sequences, the algorithm clearly leads to lower location error. Meanwhile on other sequence, the location error of the algorithm is the same as baseline or slightly worse. The performance gains on the sequence which semimajor axis and semiminor axis of the detected ellipse are satisfied to far outweigh the performance loss on the sequence with , as presented in the 8th column of Table 2.
Furthermore, compare the algorithm with the algorithm . On almost all synthetics sequences, clearly leads to lower location error. This is because the algorithm consists of two advantages. On the one hand, the algorithm takes into account the kinematics of the target in the scene, and the temporal information among the frames is used to improve the 3D pose estimation of the rigid object. On the other hand, the good imaging point subset is extracted, and these imaging points can be used to optimize the observed for object. On synthetics sequences Seq5, the location error of the algorithm is slightly worse, may be the suggestion value of the parameter is not fit the seq5. It needs to set more appropriate value of parameter for different dynamic object parameters. The filter such as CKF [35], UKF, and PF [32] can be used for the object 2D-3D pose estimation system, and they can yield accurate results in 3D pose recognition from sequences of monocular images of the dynamics target.
6. Conclusions
The paper proposes a closed-form solution for estimating the accuracy of the circular feature’s pose. The solution of circular feature’s pose parameters is obtained using all detected 2D imaging points corresponding to the circular feature in 3D. The algorithm presented the idea that the relation from the 2D imaging point to circular feature’s pose is represented by an implicit function, and then the implicit function combined with the extraction error of 2D imaging point computes the accuracy of the pose variables, which is the covariance matrix of circular feature’s pose. The main contributions of the present work can be summarized as follows: the algorithm defines a representation method for 5D circular pose parameters, and then the representation combines with the projection model of the 3D object circular feature to yield 2D virtual ellipse contour. An implicit function from 2D imaging point to the 5D circular feature’s pose variable is defined using the points-to-contour distance, and then Gauss–Newton theorem is employed to compute the Jacobi matrix of the function with respect to such points; after that the Jacobi matrix is combined with 2D imaging point coordinates to estimate the accuracy of circular feature’s pose variables. The covariance matrix of circular feature’s 5D pose variables is studied, and the metric is introduced to guide the 2D imaging point selection. Integrating 2D imaging point selection into the pose estimation system leads to accuracy improvements of location parameter.
Appendix
A. The 2D Ellipse Equation Parameter Model with Pose Parameter
Let and denote the coordinate center and surface normal vector of the circular feature in 3D with radius . Projection of the circular feature in 3D into the image plane to yield the 2-D ellipse curve is represented in the following equation:where we shall represent the parameters by the location and the 3D orientation vector .
To find the 2-D ellipse curve, we will first form a cone having the projection center as vertex and which joins the vertex to every point on the circular whose center position is and surface normal vector is and intersect the cone with the image plane .
In order to find the equation of the cone , we need to construct the equation of the base circular and the line that joins the vertex to the point on the circular . The equation of the base circular is obtained by intersecting the sphere whose center is and radius is with the plane whose surface normal vector is as follows:where the point .
For , the equation of line that joins the vertex to the point on the circular and the point on the cone is represented as follows:where we can obtain the coordinate of the points on circular and it can be denoted as follows:
Especially, and , and by solving two simultaneous equations given by equations (A.1) and (A.3), the equation of cone can be written as follows:
By replacing with , the parameter model of 2D ellipse curve equation with pose parameter is expressed as follows:where is focal length of the camera. By replacing with , equation (A.5) can be written as
B. Objective Function for 2D-3D Object Pose Estimation Using Circular Feature
The ultimate goal of object 2D-3D pose estimation is to identify 3D pose of an object of interest from a 2D image or image sequence.
Given the observed image (possibly consisting of several image sequences), we shall give the objective function using the maximum posterior probability estimation (MAP) for 2D-3D object pose estimation. By maximizing the conditional probability distribution of the pose , the objective function for 2D-3D object pose estimation is defined as follows:where can be expanded as the following equation:where is the likelihood of the arrived observation and represents the prior information of pose of the circular feature in 3D.
UKF, CKF, and PF can be employed in 2D-3D object pose estimation problems, where the overall objective is to estimate pose of a moving object from a collection of samples arriving sequentially. The UKF can be seen as population-based Monte Carlo algorithms, in which the distribution of the pose space is approximated using a minimal set of carefully chosen sample points, called sigma points. Each sigma point is composed by a single pose and has an associated weighting coefficient. Let be a set of sigma points in time step (where is the dimension of the state space), containing information about the pose and their associated weights as follows:
For the 2D-3D object pose estimation using circular feature, the sigma points are given by the location coordinates ‘’, orientation angles ‘’, and their velocity vector as follows:
Note that a sigma point described in equation (4) represents a single pose state in the time step k − 1.
To estimate the pose of the object with these sigma points, each sigma point is propagated through the nonlinear process model. The transformed points are used to compute the mean and covariance of the prediction value . We propagate then the sigma points through the nonlinear observation model equation (2) as follows:with the resulted transformed observations, their mean , covariance , and the cross covariance are computed, respectively. We combine the information obtained in the prediction step with the new observation measured , and the gain is given, and then the posterior mean and covariance are updated.
Algorithm steps are as follows: Algorithm 2
|
Data Availability
The data used to support the findings of this study are available in the following website: https://github.com/licui2006060222/Database.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
All the authors have participated in writing the manuscript. All authors read and approved the manuscript.
Acknowledgments
The authors would like to thank the First-Class Disciplines Foundation of Ningxia (contract no. NXYLXK2017B09) and the Major Project of North Minzu University (contract no. ZDZX201801) for supporting this work. This work was supported in part by the National Natural Science Foundation of China (no. 11961001), by the Natural Science Foundation of Ningxia (no. 2018AAC03126), and by the Open Project of the Key Laboratory of Intelligent Information and Big Data Processing of Ningxia Province (no. 2019KLBD004).