Research Article

Grasp Detection under Occlusions Using SIFT Features

Figure 3

The structure of our network. The network uses RG-D images as inputs and predicts multiple grasp rectangles which contain position information, orientation information, and grasp quality score for each object. The data preprocessing part fuses the RGB image and the corresponding depth image to be the RG-D image and crops the RG-D image into a given size. The 1–40 layers of our network extract a feature map, the feature map is then fed into the Grasp Proposal Network and the ROI pooling. The rest of the network (41–50 layers) receives the output of the Grasp Proposal Network and generates several grasp rectangles.