Simulation as an engine of physical scene understanding

Peter W Battaglia¹, Jessica B Hamrick, Joshua B Tenenbaum

Affiliations

PMID: 24145417
PMCID: PMC3831455
DOI: 10.1073/pnas.1306572110

Simulation as an engine of physical scene understanding

Peter W Battaglia et al. Proc Natl Acad Sci U S A. 2013.

. 2013 Nov 5;110(45):18327-32.

doi: 10.1073/pnas.1306572110. Epub 2013 Oct 21.

Authors

Peter W Battaglia¹, Jessica B Hamrick, Joshua B Tenenbaum

Affiliation

¹ Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139.

PMID: 24145417
PMCID: PMC3831455
DOI: 10.1073/pnas.1306572110

Abstract

In a glance, we can perceive whether a stack of dishes will topple, a branch will support a child's weight, a grocery bag is poorly packed and liable to tear or crush its contents, or a tool is firmly attached to a table or free to be lifted. Such rapid physical inferences are central to how people interact with the world and with each other, yet their computational underpinnings are poorly understood. We propose a model based on an "intuitive physics engine," a cognitive mechanism similar to computer engines that simulate rich physics in video games and graphics, but that uses approximate, probabilistic simulations to make robust and fast inferences in complex natural scenes where crucial information is unobserved. This single model fits data from five distinct psychophysical tasks, captures several illusions and biases, and explains core aspects of human mental models and common-sense reasoning that are instrumental to how humans understand their everyday world.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
Everyday scenes, activities, and art that evoke strong physical intuitions. (A) A cluttered workshop that exhibits many nuanced physical properties. (B) A 3D object-based representation of the scene in A that can support physical inferences based on simulation. (C) A precarious stack of dishes looks like an accident waiting to happen. (D) A child exercises his physical reasoning by stacking blocks. (E) Jenga puts players’ physical intuitions to the test. (F) “Stone balancing” exploits our powerful physical expectations (Photo and stone balance by Heiko Brinkmann).

**Fig. 2.**
(A) The IPE model takes inputs (e.g., perception, language, memory, imagery, etc.) that instantiate a distribution over scenes (1), then simulates the effects of physics on the distribution (2), and then aggregates the results for output to other sensorimotor and cognitive faculties (3). (B) Exp. 1 (Will it fall?) tower stimuli. The tower with the red border is actually delicately balanced, and the other two are the same height, but the blue-bordered one is judged much less likely to fall by the model and people. (C) Probabilistic IPE model (x axis) vs. human judgment averages (y axis) in Exp. 1. See Fig. S3 for correlations for other values of σ and ϕ. Each point represents one tower (with SEM), and the three colored circles correspond to the three towers in B. (D) Ground truth (nonprobabilistic) vs. human judgments (Exp. 1). Because it does not represent uncertainty, it cannot capture people’s judgments for a number of our stimuli, such as the red-bordered tower in B. (Note that these cases may be rare in natural scenes, where configurations tend to be more clearly stable or unstable and the IPE would be expected to correlate better with ground truth than it does on our stimuli.)

**Fig. 3.**
(A) Exp. 2 (In which direction?). Subjects viewed the tower (*Upper*), predicted the direction in which it would fall by adjusting the white line with the mouse, and received feedback (*Lower*). (B) Exp. 2: Angular differences between the probabilistic IPE model’s and subjects’ circular mean judgments for each tower (blue points), where 0 indicates a perfect match. The gray bars are circular histograms of the differences. The red line indicates the tower in A. (C) The same as B, but for the ground truth model. (D) Exp. 3 (Will it fall?: mass): State pair stimuli (main text). Light blocks are green, and heavy ones are dark. (E) Exp. 3: The mass-sensitive IPE model’s vs. people’s judgments, as in Fig. 2C. The black lines connect state pairs. Both model and people vary their judgments similarly within each state pair (lines’ slopes near 1). (F) Exp. 4: The mass-insensitive model vs. people. Here the model cannot vary its judgments within state pairs (lines are near vertical). (G) Exp. 4 (In which direction?: mass): State pair stimuli. (H) Exp. 4: The mass-sensitive IPE model’s vs. people’s judgments, as in B. The black lines connect state pairs. The model’s and people’s judgments are closely matched within state pairs (short black lines). (I) Exp. 4: The mass-insensitive IPE model vs. people. Here again, the model cannot vary its judgments per state pair (longer black lines).

**Fig. 4.**
Exp. 5 (Bump?). (A) Scene stimuli, whose tables have different obstacles (T0–T4). (B) In the uncued bump condition, subjects were not informed about the direction from which the bump would strike the scene; in the cued bump conditions, a blue arrowhead indicated the bump’s direction. (C) The disk plot shows IPE model predictions per bump direction (angle) and ϕ (radius) for the stimulus in the image; the blue arrowheads/arcs indicate the range of bump angles simulated per bump cue, and the green circle and arrowheads represent the uncued condition. *Inset* bar graphs show the model’s and people’s responses, per cue/condition. (D) The same block configuration as in C, with different obstacles (T1). (*E–J*) IPE model’s (x axis) vs. people’s (y axis) mean judgments (each point is one scene, with SEM). The lines in *G–J* indicate cue-wise pairs. Each subplot show one cue condition and IPE model variant (correlations in parentheses, with P value of difference from full IPE): (E) Uncued, full IPE. (F) Uncued, obstacle insensitive (model assumes T0). (G) Cued, full IPE. (H) Cued, obstacle insensitive. (I) Cued, cue insensitive (model averages over all bump angles). (J) Cued, obstacle and cue insensitive.

See this image and copyright information in PMC

References

1. Marr D. Vision. San Francisco: Freeman; 1982.
1. Baillargeon R (2002) The acquisition of physical knowledge in infancy: A summary in eight lessons. Blackwell Handbook of Childhood Cognitive Development (Blackwell, Oxford), Vol 1, pp 46–83.
1. Spelke ES, Breinlinger K, Macomber J, Jacobson K. Origins of knowledge. Psychol Rev. 1992;99(4):605–632. - PubMed
1. Talmy L. Force dynamics in language and cognition. Cogn Sci. 1988;12(1):49–100.
1. Tomasello M. The Cultural Origins of Human Cognition. Cambridge, MA: Harvard Univ Press; 1999.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Simulation as an engine of physical scene understanding

Affiliation

Simulation as an engine of physical scene understanding

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources