[This article belongs to Volume - 38, Issue - 05]

Densely occluded grasping objects detection based on RGB-D fusion

Indoor service robots need to build an object-centric semantic map to understand and execute human instructions. Conventional visual simultaneous localization and mapping (SLAM) systems build a map using geometric features such as points, lines, and planes as landmarks. However, they lack a semantic understanding of the environment. This paper proposes an object-level semantic SLAM algorithm based on RGB-D data, which uses a quadric surface as an object model to compactly represent the object’s position, orientation, and shape. This paper proposes and derives two types of RGB-D camera-quadric observation models: a complete model and a partial model. The complete model combines object detection and point cloud data to estimate a complete ellipsoid in a single RGB-D frame. The partial model is activated when the depth data is severely missing because of illuminations or occlusions, which uses bounding boxes from object detection to constrain objects. Compared with the state-of-the-art quadric SLAM algorithms that use a monocular observation model, the RGB-D observation model reduces the requirements of the observation number and viewing angle changes, which helps improve the accuracy and robustness. This paper introduces a nonparametric pose graph to solve data associations in the back end, and innovatively applies it to the quadric surface model. We thoroughly evaluated the algorithm on two public datasets and an author-collected mobile robot dataset in a home-like environment. We obtained obvious improvements on the localization accuracy and mapping effects compared with two state-of-the-art object SLAM algorithms.