Research Article
A Robust Convolutional Neural Network for 6D Object Pose Estimation from RGB Image with Distance Regularization Voting Loss
Figure 1
The 3D translation and 3D rotation are estimated through 2D and 3D keypoints correspondences. (b, c) The pixelwise labeling and pixelwise unit vectors field for keypoints voting, respectively. (d, e) The voting process for finding keypoints and calculate the distances among pixels and keypoints that affect the hypotheses. (f, g) The 2D and 3D keypoints correspondences using Perspective-n-Point (PnP), and finally, the 6D poses of objects are estimated (h). (e) and are the pixels, is the keypoint, and are the angles between the two ground-truth and estimated unit vectors from pixels to keypoint, and are the foot of perpendiculars, and and are the distances between keypoint and foot of the perpendicular. (a) Input image, (b) pixel labeling, (c) vector field, (d) voting, (e) pixels to keypoints distances, (f) 2D keypoints, (g) 3D keypoints, and (h) 6D object pose.