Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 28:4:170034.
doi: 10.1038/sdata.2017.34.

A dataset of stereoscopic images and ground-truth disparity mimicking human fixations in peripersonal space

Affiliations

A dataset of stereoscopic images and ground-truth disparity mimicking human fixations in peripersonal space

Andrea Canessa et al. Sci Data. .

Abstract

Binocular stereopsis is the ability of a visual system, belonging to a live being or a machine, to interpret the different visual information deriving from two eyes/cameras for depth perception. From this perspective, the ground-truth information about three-dimensional visual space, which is hardly available, is an ideal tool both for evaluating human performance and for benchmarking machine vision algorithms. In the present work, we implemented a rendering methodology in which the camera pose mimics realistic eye pose for a fixating observer, thus including convergent eye geometry and cyclotorsion. The virtual environment we developed relies on highly accurate 3D virtual models, and its full controllability allows us to obtain the stereoscopic pairs together with the ground-truth depth and camera pose information. We thus created a stereoscopic dataset: GENUA PESTO-GENoa hUman Active fixation database: PEripersonal space STereoscopic images and grOund truth disparity. The dataset aims to provide a unified framework useful for a number of problems relevant to human and computer vision, from scene exploration and eye movement studies to 3D scene reconstruction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Examples of 3D model acquisition and registration.
For each presented object, the insets on the left show the different 3D raw scans used to build the complete object model.
Figure 2
Figure 2
Figure 3
Figure 3. Schematic representation of the geometry of the binocular active vision system.
F is the fixation point, C is the cyclopic position (halfway between the eyes), L and R are the left and right camera positions, separated by a baseline b=60 mm. The α, β and γ stand for the elevation (pitch), azimuth (yaw) and torsion (roll) angles of the left L and right R eye. The nose direction is the line orthogonal to the baseline and lying in a transverse plane passing through the eyes. The angles ε and ν stands for the binocular azimuth and vergence (see text for detailed explanation).
Figure 4
Figure 4. The projections of a fronto-parallel rectangle onto the image planes, drawn in red for the left image and blue for the right.
The texture applied to the rectangle is a regular grid. (a) The projection obtained with the off-axis technique: only horizontal disparity is introduced. (b) The projection obtained with the toe-in technique: both vertical and horizontal disparities are introduced.
Figure 5
Figure 5
Figure 6
Figure 6. Representation of head position and 3D fixation point in the virtual scene.
(a) Side and top view of the position of the 10 vantage points used to place the head within the 3D scene. The solid thick lines represent the nose direction for each head position. (b) Image acquired by the cyclopean camera from one of the ten vantage points (top), and geometrical configuration of the camera system with respect to the 3D scene (bottom). The red dots represent the 9×15 grid of points equally spaced on the image plane of the cyclopean camera (top), that are used to compute the actual 3D fixation points in the scene (bottom). The black solid line represents the nose direction, while the green lines represent the four most lateral fixations within the grid of fixation points.
Figure 7
Figure 7. Example of stereoscopic pairs from the dataset, including, from top to bottom, the left and right views, the horizontal and vertical ground-truth disparity maps, and the occlusion and edge maps.
In the disparity maps, reported in pixel, hot colors represent crossed horizontal disparity and right-hyper vertical disparity, whereas blue colors represent uncrossed horizontal disparities and left-hyper vertical disparities, according to the colorbars on the right.
Figure 8
Figure 8. Disparity reconstruction error, computed on three different stereo quality indexes over the whole dataset, and represented as median value (horizontal thick line), first and third quartile (rectangle) and range (whiskers).
Three different indexes are represented, from left to right: the Mean Absolute Error (MAE,), the Normalized Cross Correlation (NCC) and the Structure Similarity Index (SSIM,). Each index has been computed for the original stereo pair (ORIG), not considering the occluded areas (NO OCC), not considering both the occluded areas and the depth edges (NO DE), and finally considering only the pixels corresponding to the occluded areas and the depth edges (OCC).

Comment in

References

Data Citations

    1. Canessa A. 2016. Dryad Digital Repository . http://dx.doi.org/10.5061/dryad.6t8vq - DOI

References

    1. Mian A. S., Bennamoun M. & Owens R. Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1584–1601 (2006). - PubMed
    1. Browatzki B., Fischer J., Graf B., Bülthoff H. H. & Wallraven C. Going into depth: Evaluating 2d and 3d cues for object classification on a new, large-scale object dataset. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 1189–1195 (IEEE, 2011).
    1. Anand A., Koppula H. S., Joachims T. & Saxena A. Contextually guided semantic labeling and search for three-dimensional point clouds. The International Journal of Robotics Research, 32, 19–34 (2012).
    1. Koppula H. S., Anand A., Joachims T. & Saxena A. Semantic labeling of 3d point clouds for indoor scenes. In Advances in Neural Information Processing Systems, pages 244–252 (2011).
    1. Su C., Bovik A. C. & Cormack L. K. Natural scene statistics of color and rangeIn 2011 18th IEEE International Conference on Image Processing, pages 257–260 (IEEE, 2011).

Publication types