Research Article
A Robust Convolutional Neural Network for 6D Object Pose Estimation from RGB Image with Distance Regularization Voting Loss
Figure 2
The complete 6D object pose estimation process that performs semantic segmentation, vector field prediction pointing towards keypoints of the object, distances between pixels and keypoints, hypothesis selection, and finally the pose estimation. (b) A ResNet-50 is used that passes higher level feature maps of input image (a–c) feature pyramid for detecting features at different scales. Then, (d) upsampling stage achieves the features map size equal to the input image, and then (e) pixelwise labeling and (f) pixelwise unit vector field are achieved. (g) Pixel to keypoint distances help RANSAC in finding accurate hypothesis, and then (h) keypoints are estimated, based on which (i) 6D poses are calculated. (a) Input image, (b) ResNet-50, (c) feature pyramid, (d) upsampling, (e) pixel labeling, (f) vector field, (g) pixels to keypoints distances, (h) keypoints, and (i) 6D pose.