Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 11;21(4):1299.
doi: 10.3390/s21041299.

RobotP: A Benchmark Dataset for 6D Object Pose Estimation

Affiliations

RobotP: A Benchmark Dataset for 6D Object Pose Estimation

Honglin Yuan et al. Sensors (Basel). .

Abstract

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.

Keywords: 3D reconstruction; 6D pose estimation; benchmark dataset; sensors.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 1
Figure 1
Scene examples and visualization of estimated poses by the approach that was proposed from our benchmark.
Figure 2
Figure 2
Daily used objects in our dataset.
Figure 3
Figure 3
Different three-dimensional (3D) cameras. Left: time-of-flight camera. Middle: structured-light camera. Right: depth-from-stereo camera.
Figure 4
Figure 4
The pipeline of the pose estimation process. The input are RGB images and the initial poses of these images are estimated by Structure from Motion (SfM). After that, the initial poses are refined locally and globally.
Figure 5
Figure 5
The local pose groups. They are clustered based on angle and distance similarities.
Figure 6
Figure 6
(a) The captured depth image: the red rectangle shows left invalid depth band. (b) Misalignment of color-and-depth image pairs: the images are generated when the distance between the object and camera is near, showing large misalignment.
Figure 7
Figure 7
The depth images are estimated by COLMAP, showing better alignment.
Figure 8
Figure 8
Snapshots from our simulator showing a robot synthesizing data. Green points and red lines are positions and view directions of input cameras, black lines are the view directions of the virtual camera, and the long line is the whole trajectory of the virtual camera.
Figure 9
Figure 9
Reprojection error comparison with and without pose refinement for different objects.
Figure 10
Figure 10
Examples of depth alignment results on table1 and table2 scenarios. The first column is the aligned depth image, the second column is the matching between captured depth and color images, and the third column is the matching between aligned depth and color images. The black color is the missing information.
Figure 11
Figure 11
The depth fusion results on table1 and table2 scenarios. The first column are color images, and the second column are the estimated depth images by COLMAP and the third column are the depth images generated by our approach.
Figure 12
Figure 12
Examples of three-dimensional (3D) point clouds for the objects in our dataset. The point clouds shown in figures (a) and (b) are generated by COLMAP and our approach, respectively.
Figure 13
Figure 13
Examples of segmentation masks and bounding boxes for different objects.
Figure 14
Figure 14
Examples of synthesized color-and-depth image pairs.
Figure 15
Figure 15
Examples of accuracy performance. Each 3D model is projected to the image plane with the estimated 6D pose.

References

    1. Wang C., Xu D., Zhu Y., Martín-Martín R., Lu C., Fei-Fei L., Savarese S. Densefusion: 6D object pose estimation by iterative dense fusion; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Long Beach, CA, USA. 15–21 June 2019; pp. 3343–3352.
    1. Chen W., Duan J., Basevi H., Chang H.J., Leonardis A. Ponitposenet: Point pose network for robust 6D object pose estimation; Proceedings of the IEEE Winter Conference on Applications of Computer Vision; Snowmass Village, CO, USA. 1–5 March 2020; pp. 2824–2833.
    1. Tekin B., Sinha S.N., Fua P. Real-time seamless single shot 6D object pose prediction; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Salt Lake City, UT, USA. 18–23 June 2018; pp. 292–301.
    1. Garcia-Garcia A., Martinez-Gonzalez P., Oprea S., Castro-Vargas J.A., Orts-Escolano S., Garcia-Rodriguez J., Jover-Alvarez A. The robotrix: An extremely photorealistic and very-large-scale indoor dataset of sequences with robot trajectories and interactions; Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Madrid, Spain. 1–5 October 2018; pp. 6790–6797.
    1. Lepetit V., Moreno-Noguer F., Fua P. Epnp: An accurate O(n) solution to the pnp problem. Int. J. Comput. Vis. 2009;81:155. doi: 10.1007/s11263-008-0152-6. - DOI

LinkOut - more resources