Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 25:11:e74943.
doi: 10.7554/eLife.74943.

Perception of an object's global shape is best described by a model of skeletal structure in human infants

Affiliations

Perception of an object's global shape is best described by a model of skeletal structure in human infants

Vladislav Ayzenberg et al. Elife. .

Abstract

Categorization of everyday objects requires that humans form representations of shape that are tolerant to variations among exemplars. Yet, how such invariant shape representations develop remains poorly understood. By comparing human infants (6-12 months; N=82) to computational models of vision using comparable procedures, we shed light on the origins and mechanisms underlying object perception. Following habituation to a never-before-seen object, infants classified other novel objects across variations in their component parts. Comparisons to several computational models of vision, including models of high-level and low-level vision, revealed that infants' performance was best described by a model of shape based on the skeletal structure. Interestingly, infants outperformed a range of artificial neural network models, selected for their massive object experience and biological plausibility, under the same conditions. Altogether, these findings suggest that robust representations of shape can be formed with little language or object experience by relying on the perceptually invariant skeletal structure.

Keywords: categorization; human; infant development; medial axis; neuroscience; object recognition; one-shot learning; shape perception.

PubMed Disclaimer

Conflict of interest statement

VA, SL No competing interests declared

Figures

Figure 1.
Figure 1.. Screen shots of the stimuli used in Experiment 1 (left) and Experiment 2 (right).
Objects were presented as rotating videos during habituation and test phases.
Figure 2.
Figure 2.. Experimental design and results for (top) Experiment 1 and (bottom) Experiment 2.
(A, D) Illustration of the experimental procedure administered to infants and the computational models in (A) Experiment 1 and (D) Experiment 2. Infants and models were habituated to one object and then tested with objects that consisted of either the same or different shape skeleton. Both types of test objects (counterbalanced order) differed in their surface forms from the habituation object. (B, E) Mean looking times for (B) Experiment 1 and (E) Experiment 2. For the habituation phase, results are shown for the first four and last four trials. For the test phase, results are shown for the two types of test objects (i.e. same and different skeletons; 3 test trials each). Error bars represent SE. (C, F) Classification performance for infants and models for (C) Experiment 1 and (F) Experiment 2. Error bars represent bootstrapped confidence intervals, and the dashed line represents chance performance.
Figure 3.
Figure 3.. Experimental design and results for the surface form classification task used with the computational models.
(A) Illustration of the experimental procedure administered to models. (B–C) Classification performance of models on stimuli from (B) Experiment 1 and (C) Experiment 2. Error bars represent bootstrapped confidence intervals and dashed lines represent chance performance.
Figure 4.
Figure 4.. Examples of autoencoder reconstructions using objects from Experiment 1 (top) and Experiment 2 (bottom) for all models except FlowNet.
FlowNet reconstructions are not possible because it requires multiple frames as input. For the Skeletal model, the inset displays the original input image. Each reconstruction was created by feeding a random frame from the habituation object video to each model immediately following its habituation to said video.
Figure 5.
Figure 5.. Dissimilarity matrices for each computational model in Experiment 1.
Dissimilarity for each object pair was calculated as the error from an autoencoder following habituation to one object and testing on a second object. Internal values of each cell in the matrix indicate the error between habituation and test objects. Error-values are normalized to the end of habituation. Dissimilarity matrices are asymmetrical because the error value changes depending on which object the model was habituated to. The object adjacent to each row is the habituation object, and the object adjacent to each column is the test object.
Figure 6.
Figure 6.. Dissimilarity matrices for each computational model in Experiment 2.
Dissimilarity for each object pair was calculated as the error from an autoencoder following habituation to one object, and testing on a second object. Internal values of each cell in the matrix indicate the error between habituation and test objects. Error-values are normalized to the end of habituation. Dissimilarity matrices are asymmetrical because the error value changes depending on which object the model was habituated to. The object adjacent to each row is the habituation object, and the object adjacent to each column is the test object.

References

    1. Amir O, Biederman I, Hayworth KJ. Sensitivity to nonaccidental properties across various shape dimensions. Vision Research. 2012;62:35–43. doi: 10.1016/j.visres.2012.03.020. - DOI - PubMed
    1. Arcaro MJ, Livingstone MS. A hierarchical, retinotopic proto-organization of the primate visual system at birth. eLife. 2017;6:e26196. doi: 10.7554/eLife.26196. - DOI - PMC - PubMed
    1. Ardila D, Mihalas S, von der Heydt R, Niebur E. 46th Annual Conference on Information Sciences and Systems (CISS. Princeton. 2012;1:1–4. doi: 10.1109/CISS.2012.6310946. - DOI
    1. Ayzenberg V, Chen Y, Yousif SR, Lourenco SF. Skeletal representations of shape in human vision: Evidence for a pruned medial axis model. Journal of Vision. 2019a;19:1–21. doi: 10.1167/19.6.6. - DOI - PMC - PubMed
    1. Ayzenberg V, Lourenco SF. Skeletal descriptions of shape provide unique perceptual information for object recognition. Scientific Reports. 2019b;9:1–13. doi: 10.1038/s41598-019-45268-y. - DOI - PMC - PubMed

Publication types