. 2023 Sep 11;23(18):7807.

doi: 10.3390/s23187807.

OHO: A Multi-Modal, Multi-Purpose Dataset for Human-Robot Object Hand-Over

Benedict Stephan¹, Mona Köhler¹, Steffen Müller¹, Yan Zhang², Horst-Michael Gross¹, Gunther Notni^{2

3}

Affiliations

¹ Neuroinformatics and Cognitive Robotics Lab, Technische Universität Ilmenau, 98693 Ilmenau, Germany.
² Group for Quality Assurance and Industrial Image Processing, Technische Universität Ilmenau, 98693 Ilmenau, Germany.
³ Fraunhofer Institute for Applied Optics and Precision Engineering, IOF Jena, 07745 Jena, Germany.

PMID: 37765862
PMCID: PMC10537499
DOI: 10.3390/s23187807

OHO: A Multi-Modal, Multi-Purpose Dataset for Human-Robot Object Hand-Over

Benedict Stephan et al. Sensors (Basel). 2023.

. 2023 Sep 11;23(18):7807.

doi: 10.3390/s23187807.

Authors

Benedict Stephan¹, Mona Köhler¹, Steffen Müller¹, Yan Zhang², Horst-Michael Gross¹, Gunther Notni^{2

3}

Affiliations

¹ Neuroinformatics and Cognitive Robotics Lab, Technische Universität Ilmenau, 98693 Ilmenau, Germany.
² Group for Quality Assurance and Industrial Image Processing, Technische Universität Ilmenau, 98693 Ilmenau, Germany.
³ Fraunhofer Institute for Applied Optics and Precision Engineering, IOF Jena, 07745 Jena, Germany.

PMID: 37765862
PMCID: PMC10537499
DOI: 10.3390/s23187807

Abstract

In the context of collaborative robotics, handing over hand-held objects to a robot is a safety-critical task. Therefore, a robust distinction between human hands and presented objects in image data is essential to avoid contact with robotic grippers. To be able to develop machine learning methods for solving this problem, we created the OHO (Object Hand-Over) dataset of tools and other everyday objects being held by human hands. Our dataset consists of color, depth, and thermal images with the addition of pose and shape information about the objects in a real-world scenario. Although the focus of this paper is on instance segmentation, our dataset also enables training for different tasks such as 3D pose estimation or shape estimation of objects. For the instance segmentation task, we present a pipeline for automated label generation in point clouds, as well as image data. Through baseline experiments, we show that these labels are suitable for training an instance segmentation to distinguish hands from objects on a per-pixel basis. Moreover, we present qualitative results for applying our trained model in a real-world application.

Keywords: 6D pose estimation; automated labeling; dataset; hand-over; semantic segmentation; thermal image.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Cameras mounted on top of the TIAGo robot’s head.

**Figure 2**
Examples of all modalities for one sample in our dataset, including color and depth images from two RGB-D cameras, a segmented point cloud (hand, object, and background), and object shape and pose, in addition to thermal data.

**Figure 3**
Sequence of recording one object for the OHO dataset.

**Figure 4**
Example result of using the keying node in Blender on a reference recording (**left**) and the corresponding sample with a hand (**right**).

**Figure 5**
Pipeline for generation of segmentation labels. Color codes for GrabCut labels are as follows: BGD = blue, PR_BGD = red, PR_FGD = yellow, and FGD = green.

**Figure 6**
Examples of automatically generated masks for objects (blue) and hands (red) on top of raw RGB images from the dataset.

**Figure 7**
Result of using Gaussian blur on edges vs. simple overlaying of the foreground and background.

**Figure 8**
Examples of generated images for instance segmentation.

**Figure 9**
Qualitative comparison of the segmentation performance of different models (all with COCO pretraining) in a real-world setting (no stitched images).

**Figure 10**
Qualitative results of cross-domain datasets generated by YolactEdge [24] trained on our OHO dataset (hand vs. object). left: applied on WorkingHands [17] (real), middle: applied on WorkingHands [17] (synthetic), right: applied on ContactPose [19].

See this image and copyright information in PMC

References

1. He K., Gkioxari G., Dollár P., Girshick R. Mask R-CNN; Proceedings of the IEEE International Conference on Computer Vision (ICCV); Venice, Italy. 22–29 October 2017; pp. 2961–2969.
1. Kirillov A., Wu Y., He K., Girshick R. PointRend: Image Segmentation as Rendering; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Seattle, WA, USA. 13–19 June 2020; pp. 9799–9808.
1. Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale; Proceedings of the International Conference on Learning Representations—ICLR 2021; Vienna, Austria. 4 May 2021.
1. Seichter D., Langer P., Wengefeld T., Lewandowski B., Hoechemer D., Gross H.M. Efficient and Robust Semantic Mapping for Indoor Environments; Proceedings of the IEEE International Conference on Robotics and Automation (ICRA); Philadelphia, PA, USA. 23–27 May 2022; pp. 9221–9227.
1. Qi C.R., Su H., Mo K., Guibas L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Honolulu, HI, USA. 21–26 July 2017; pp. 652–660.

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

P2017-01-005/Carl Zeiss Foundation

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

OHO: A Multi-Modal, Multi-Purpose Dataset for Human-Robot Object Hand-Over

Affiliations

OHO: A Multi-Modal, Multi-Purpose Dataset for Human-Robot Object Hand-Over

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources