Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 30;378(1869):20210443.
doi: 10.1098/rstb.2021.0443. Epub 2022 Dec 13.

New Approaches to 3D Vision

Affiliations

New Approaches to 3D Vision

Paul Linton et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

New approaches to 3D vision are enabling new advances in artificial intelligence and autonomous vehicles, a better understanding of how animals navigate the 3D world, and new insights into human perception in virtual and augmented reality. Whilst traditional approaches to 3D vision in computer vision (SLAM: simultaneous localization and mapping), animal navigation (cognitive maps), and human vision (optimal cue integration) start from the assumption that the aim of 3D vision is to provide an accurate 3D model of the world, the new approaches to 3D vision explored in this issue challenge this assumption. Instead, they investigate the possibility that computer vision, animal navigation, and human vision can rely on partial or distorted models or no model at all. This issue also highlights the implications for artificial intelligence, autonomous vehicles, human perception in virtual and augmented reality, and the treatment of visual disorders, all of which are explored by individual articles. This article is part of a discussion meeting issue 'New approaches to 3D vision'.

Keywords: 3D vision; artificial intelligence; computer vision; human vision; navigation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The 360° point cloud created by the Alpha Prime Velodyne Lidar. © Velodyne Lidar. (Online version in colour.)
Figure 2.
Figure 2.
Diagram of Dyna-Q [85] redrawn from [86], p.7, which incorporates both ‘model-based’ (‘value/policy’ → ‘experience’ → ‘model’ → ‘value/policy’) and ‘model-free’ (‘value/policy’ → ‘experience’ → ‘value/policy‘) reinforcement learning. (Online version in colour.)
Figure 3.
Figure 3.
DeepMind's ‘neural scene representation and rendering’ [106]. First, the ‘Representation Network’ creates a low-dimensional ‘Scene Representation’ based on images from a number of different views (View 1 and View 2). Second, the ‘Generation Network’ uses this low-dimensional ‘Scene Representation’ to predict what View 3 will be. © authors. (Online version in colour.)
Figure 4.
Figure 4.
‘Neural Radiance Fields' (NeRFs) [138]. NeRFs generate images by sampling points along a ray (5D input = position + direction) and the network (FΘ) outputs a colour and density value for each sampled point. NeRF is ‘geometry aware’ since its inputs are explicitly in 3D coordinates (x,y,z). © authors. (Online version in colour.)
Figure 5.
Figure 5.
A place cell's firing field (left) and a grid cell's grid of firing fields (right) as a rat moves around an enclosure. Recorded by Elizabeth Marozzi. From [172]. © authors. (Online version in colour.)
Figure 6.
Figure 6.
Hypotheses being tested by [11] and [12]. On the left is the 2D hexagonal grid of grid cell firing fields we saw in figure 5. The remaining panels explore potential 3D grid arrangements. Some sort of 3D ‘global order’ was originally hypothesised [–197]. But Ginosar et al. [11] only find evidence of ‘local order’ in bats, whilst the results in rats [12] are consistent with a ‘random arrangement’. From Ginosar et al. 2021 [11]. © authors. (Online version in colour.)
Figure 7.
Figure 7.
Inverse graphics network from [281]. How do we recognize someone as ‘John’? Rather than train a neural network to directly identify people (2D image → identity), [281] use a ‘generative model’ (on the right) (identity → 3D scene → 2D image) to produce training data for an ‘inference model’ (on the left) that first estimates the 3D scene properties of the 2D image (2D image → 3D scene), before using this 3D scene estimate to identify the person (3D scene → identity). © authors. (Online version in colour.)
Figure 8.
Figure 8.
Schematic of the task in [287]. Participants set the depth of the cylinder so that its depth appeared to be proportional to its height (the dotted line). At near viewing distances (53.5 cm) the cylinder they produced was compressed in depth, while at far viewing distances (214 cm) the cylinder they produced was elongated in depth.

References

    1. Knight W. 2022. A New Trick Lets Artificial Intelligence See in 3D. Wired. See https://www.wired.com/story/new-way-ai-see-3d/
    1. Jumper J, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, Article 7873. ( 10.1038/s41586-021-03819-2) - DOI - PMC - PubMed
    1. Tunyasuvunakool K, et al. 2021. Highly accurate protein structure prediction for the human proteome. Nature 596, Article 7873. ( 10.1038/s41586-021-03828-1) - DOI - PMC - PubMed
    1. Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ. 2017. Building machines that learn and think like people. Behav. Brain Sci. 40, e253. ( 10.1017/S0140525X16001837) - DOI - PubMed
    1. Zhu Y, et al. 2020. Dark, beyond deep: a paradigm shift to cognitive AI with Humanlike common sense. Engineering 6, 310-345. ( 10.1016/j.eng.2020.01.011) - DOI

Publication types

LinkOut - more resources