Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 4;11(27):eads6821.
doi: 10.1126/sciadv.ads6821. Epub 2025 Jul 2.

Fast and robust visual object recognition in young children

Affiliations

Fast and robust visual object recognition in young children

Vladislav Ayzenberg et al. Sci Adv. .

Abstract

By adulthood, humans rapidly identify objects from sparse visual displays and across large disruptions to their appearance. What are the minimal conditions needed to achieve robust recognition abilities and when might these abilities develop? To answer these questions, we investigated the upper limits of children's object recognition abilities. We found that children as young as 3 years successfully identified objects at speeds of 100 milliseconds (both forward and backward masked) under sparse and disrupted viewing conditions. By contrast, a range of computational models implemented with biologically informed properties or optimized for visual recognition did not reach child-level performance. Models only matched children if they received more object examples than children are capable of experiencing. These findings highlight the robustness of the human visual system in the absence of extensive experience and identify important developmental constraints for building biologically plausible machines.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Stimuli and human testing procedure.
(A) Children and adults were tested with object outlines that had either complete, perturbed, or deleted contours. (B) On each trial, participants were presented with an object image rapidly (100- to 300-ms duration), which was both forward and backward masked. In the prompt phase, child participants were asked to verbally indicate which object they saw among two possibilities (read by an experimenter). Adult participants responded by pressing an arrow key that corresponded to each object label.
Fig. 2.
Fig. 2.. Children’s performance for each condition.
Across age, children performed above chance for each condition at each duration. Error bars depict 95% confidence intervals. The dotted black line indicates chance performance (0.50).
Fig. 3.
Fig. 3.. Performance under each condition by age group.
(A) Under the complete condition, participants of all ages performed above chance, even at the fastest speeds. (B) Under the perturbed condition, 4 and 5 year olds performed above chance at all speeds, whereas 3 year olds were only above chance when durations were 200 ms and slower. (C) Under the deleted condition, 4 and 5 year olds performed above chance at all speeds, whereas 3 year olds only performed above chance at the slowest speeds (250 and 300 ms). Error bars depict 95% confidence intervals. The dotted black line indicates chance performance (0.50).
Fig. 4.
Fig. 4.. Influence of low-level shape features.
Performance separated by (A and B) curvature and (C and D) shape envelope similarity across different [(A) and (C)] stimulus durations and [(B) and (D)] age groups. The black dotted line indicates chance performance (0.50) Error bars depict 95% confidence intervals.
Fig. 5.
Fig. 5.. Model and human performance under each condition.
Performance of models and humans under the (top) complete, (middle) perturbed, and (bottom) deleted contour conditions. Human data for each age (red dotted lines: children; gray dotted lines: adults) were aggregated into fast (100 and 150 ms) and slow (200 and 250 ms) stimulus durations. Humans were compared to (A to C) biologically inspired (blue: ventral-like architecture; green: trained on child experience) and performance-optimized (orange: classification objective; violet: unsupervised and vision-language objective) models and (D to F) models selected to disambiguate between the contributions of training type, scale, and learning objective (yellow: classification objective; purple: vision-language objective). The y axis indicates classification accuracy. The black dotted line indicates chance performance (0.5). Error bars depict 95% confidence intervals for models. See fig. S3 and tables S2 to S4 for variability estimates and confidence intervals for human data.
Fig. 6.
Fig. 6.. Recognition performance as a function of experience.
Scatter plots showing the relation between classification accuracy on the y axis for (A) complete, (B) perturbed, and (C) deleted contour conditions and total number of images models were trained with on the x axis in the log scale. Human estimates are plotted as stars, and their experience is conservatively estimated as seeing one object every second of their life without sleep.

References

    1. Grill-Spector K., Kanwisher N., Visual recognition: As soon as you know it is there, you know what it is. Psychol. Sci. 16, 152–160 (2005). - PubMed
    1. Biederman I., Bar M., One-shot viewpoint invariance in matching novel objects. Vision Res. 39, 2885–2899 (1999). - PubMed
    1. Murray R. F., Sekuler A. B., Bennett P. J., Time course of amodal completion revealed by a shape discrimination task. Psychon. Bull. Rev. 8, 713–720 (2001). - PubMed
    1. Wagemans J., De Winter J., de Beeck H. O., Ploeger A., Beckers T., Vanroose P., Identification of everyday objects on the basis of silhouette and outline versions. Perception 37, 207–244 (2008). - PubMed
    1. Biederman I., Cooper E. E., Priming contour-deleted images: Evidence for intermediate representations in visual object recognition. Cog. Psychol. 23, 393–419 (1991). - PubMed

LinkOut - more resources