Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 27;9(1):9359.
doi: 10.1038/s41598-019-45268-y.

Skeletal descriptions of shape provide unique perceptual information for object recognition

Affiliations

Skeletal descriptions of shape provide unique perceptual information for object recognition

Vladislav Ayzenberg et al. Sci Rep. .

Abstract

With seemingly little effort, humans can both identify an object across large changes in orientation and extend category membership to novel exemplars. Although researchers argue that object shape is crucial in these cases, there are open questions as to how shape is represented for object recognition. Here we tested whether the human visual system incorporates a three-dimensional skeletal descriptor of shape to determine an object's identity. Skeletal models not only provide a compact description of an object's global shape structure, but also provide a quantitative metric by which to compare the visual similarity between shapes. Our results showed that a model of skeletal similarity explained the greatest amount of variance in participants' object dissimilarity judgments when compared with other computational models of visual similarity (Experiment 1). Moreover, parametric changes to an object's skeleton led to proportional changes in perceived similarity, even when controlling for another model of structure (Experiment 2). Importantly, participants preferentially categorized objects by their skeletons across changes to local shape contours and non-accidental properties (Experiment 3). Our findings highlight the importance of skeletal structure in vision, not only as a shape descriptor, but also as a diagnostic cue of object identity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
An illustration of the shape skeleton for a 2D airplane with (B) and without (A) perturbed contours. A strength of a skeletal model is that it can describe an object’s global shape structure across variations in contour. Skeletons computed using the ShapeToolbox.
Figure 2
Figure 2
Stimuli used in Experiment 1. (A) Objects were procedurally generated to have different skeletal structures. (B) Each object was also rendered with five surface forms so as to vary in contour shape and non-accidental properties (NAPs) without disrupting the object’s skeleton. A cluster analysis revealed that the first and second surface forms (from top to bottom) were comprised of the same NAPs (see Methods for more stimulus details). Subsets of these stimuli were used in Experiments 2 and 3 (see Methods).
Figure 3
Figure 3
Results from Experiment 1. (A) Bar plot displaying the correlations (Pearson) between each model and human perceptual similarity judgments (error bars are bootstrapped SE). Models did not differ significantly from each other in the degree to which they predicted human judgments. The horizontal black bar represents the noise ceiling, which indicates the expected performance of the true model given the noise in the data (width represents SE). (B) Bar plot displaying the percentage of variance accounted for by each model individually (lighter shade) and the percentage of unique variance of the total explainable variance (r2Total = 20.5%) accounted for by each model (darker shade). A model of skeletal similarity explained the most unique variance (33.13% of total explainable variance) when compared to any single model or combination of models (see Supplemental Table 2 for the unique and shared variance explained by all model combinations).
Figure 4
Figure 4
Example stimuli and results from Experiment 2. (A) Objects were comprised of three sets, each with distinct coarse spatial relations (separate columns). Crucially, objects with the same spatial relations varied in skeletal similarity by increments of 0%, 10%, 20%, 30%, 40%, or 50% (each row within a column). On the ‘same’ test trials (objects within the same column), participants were presented with a reference object (0%; top row) and an object with the same coarse spatial relations. On the ‘different’ test trials (objects across columns), participants were presented with objects that had different coarse spatial relations. Objects were presented in one of three orientations (30°, 60°, 90°; see Supplemental Fig. 2 for full stimulus set). (B) Participants’ recognition accuracy (proportion correct) for objects with the same coarse spatial relations decreased as a function of skeletal change, suggesting that humans represent object structure by their skeletons. The dotted line represents chance performance and the error bars represent ± 1 SE.
Figure 5
Figure 5
Examples of the three trial types used in Experiment 3. (A) A skeleton match trial wherein one choice object matched the sample’s skeleton, but not surface form. The other choice object matched on neither skeleton nor surface form. (B) A surface form match trial wherein one choice object matched the sample’s surface form, but not skeleton. The other choice object matched on neither skeleton nor surface form. (C) A conflict trial wherein one choice object matched the sample’s skeleton, but not surface form, and the other choice object matched the sample’s surface form, but not skeleton.
Figure 6
Figure 6
Results from the match-to-sample task of Experiment 3. (A) Participants’ mean accuracy (error bars represent ± 1 SE) on trials in which only a skeleton or surface form match was possible (dotted line indicates chance performance). (B) Participants’ categorization judgments in the conflict trial. A value closer to 1 indicates greater weighting of the object’s skeleton; a value closer to 0 indicates greater weighting of the object’s surface form. Although participants successfully matched objects by their skeletal structure or surface forms when each cue was presented in isolation, they were more likely to match objects by their skeletons, as opposed to their surface forms, when these cues conflicted with one another. (C) Histogram of participants’ responses on the conflict trials. A value greater than zero indicates greater weighting of skeletal information. The majority of participants matched objects by their skeletons, demonstrating a consistent pattern of responses across participants.

References

    1. Biederman I, Bar M. One-shot viewpoint invariance in matching novel objects. Vision research. 1999;39:2885–2899. doi: 10.1016/S0042-6989(98)00309-5. - DOI - PubMed
    1. Mash C, Arterberry ME, Bornstein MH. Mechanisms of visual object tecognition in infancy: Five‐month‐olds generalize beyond the interpolation of familiar views. Infancy. 2007;12:31–43. doi: 10.1111/j.1532-7078.2007.tb00232.x. - DOI - PubMed
    1. Logothetis NK, Pauls J, Bülthoff HH, Poggio T. View-dependent object recognition by monkeys. Current biology. 1994;4:401–414. doi: 10.1016/S0960-9822(00)00089-0. - DOI - PubMed
    1. Wood JN. Newborn chickens generate invariant object representations at the onset of visual object experience. Proceedings of the National Academy of Sciences. 2013;110:14000–14005. doi: 10.1073/pnas.1308246110. - DOI - PMC - PubMed
    1. Zoccolan D, Oertelt N, DiCarlo JJ, Cox DD. A rodent model for the study of invariant visual object recognition. Proceedings of the National Academy of Sciences. 2009;106:8748–8753. doi: 10.1073/pnas.0811583106. - DOI - PMC - PubMed