Direct Human-AI Comparison in the Animal-AI Environment
- PMID: 35686061
- PMCID: PMC9172850
- DOI: 10.3389/fpsyg.2022.711821
Direct Human-AI Comparison in the Animal-AI Environment
Abstract
Artificial Intelligence is making rapid and remarkable progress in the development of more sophisticated and powerful systems. However, the acknowledgement of several problems with modern machine learning approaches has prompted a shift in AI benchmarking away from task-oriented testing (such as Chess and Go) towards ability-oriented testing, in which AI systems are tested on their capacity to solve certain kinds of novel problems. The Animal-AI Environment is one such benchmark which aims to apply the ability-oriented testing used in comparative psychology to AI systems. Here, we present the first direct human-AI comparison in the Animal-AI Environment, using children aged 6-10 (n = 52). We found that children of all ages were significantly better than a sample of 30 AIs across most of the tests we examined, as well as performing significantly better than the two top-scoring AIs, "ironbar" and "Trrrrr," from the Animal-AI Olympics Competition 2019. While children and AIs performed similarly on basic navigational tasks, AIs performed significantly worse in more complex cognitive tests, including detour tasks, spatial elimination tasks, and object permanence tasks, indicating that AIs lack several cognitive abilities that children aged 6-10 possess. Both children and AIs performed poorly on tool-use tasks, suggesting that these tests are challenging for both biological and non-biological machines.
Keywords: AI benchmarks; Animal-AI Olympics; artificial intelligence; cognitive AI; comparative cognition; human-AI comparison; out-of-distribution testing.
Copyright © 2022 Voudouris, Crosby, Beyret, Hernández-Orallo, Shanahan, Halina and Cheke.
Conflict of interest statement
MC, BB and MS are employed by DeepMind Technologies Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures








Similar articles
-
Chess AI: Competing Paradigms for Machine Intelligence.Entropy (Basel). 2022 Apr 14;24(4):550. doi: 10.3390/e24040550. Entropy (Basel). 2022. PMID: 35455213 Free PMC article.
-
Toward human-level concept learning: Pattern benchmarking for AI algorithms.Patterns (N Y). 2023 Jul 5;4(8):100788. doi: 10.1016/j.patter.2023.100788. eCollection 2023 Aug 11. Patterns (N Y). 2023. PMID: 37602217 Free PMC article. Review.
-
Quality Models for Artificial Intelligence Systems: Characteristic-Based Approach, Development and Application.Sensors (Basel). 2022 Jun 27;22(13):4865. doi: 10.3390/s22134865. Sensors (Basel). 2022. PMID: 35808361 Free PMC article.
-
Data on human decision, feedback, and confidence during an artificial intelligence-assisted decision-making task.Data Brief. 2023 Jan 9;46:108884. doi: 10.1016/j.dib.2023.108884. eCollection 2023 Feb. Data Brief. 2023. PMID: 36691561 Free PMC article.
-
Self-Concern Across Scales: A Biologically Inspired Direction for Embodied Artificial Intelligence.Front Neurorobot. 2022 Apr 25;16:857614. doi: 10.3389/fnbot.2022.857614. eCollection 2022. Front Neurorobot. 2022. PMID: 35574229 Free PMC article. Review.
Cited by
-
The Animal-AI Environment: A virtual laboratory for comparative cognition and artificial intelligence research.Behav Res Methods. 2025 Feb 28;57(4):107. doi: 10.3758/s13428-025-02616-3. Behav Res Methods. 2025. PMID: 40021555 Free PMC article.
-
Exploring the induction and measurement of positive affective state in equines through a personality-centred lens.Sci Rep. 2025 May 27;15(1):18550. doi: 10.1038/s41598-025-98034-8. Sci Rep. 2025. PMID: 40425817 Free PMC article.
-
Predictive maps in rats and humans for spatial navigation.Curr Biol. 2022 Sep 12;32(17):3676-3689.e5. doi: 10.1016/j.cub.2022.06.090. Epub 2022 Jul 20. Curr Biol. 2022. PMID: 35863351 Free PMC article.
References
-
- Agrawal A., Batra D., Parikh D., Kembhavi A. (2018). Don’t just assume: look and answer: overcoming priors for visual question answering. Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 4971–4980. doi: 10.1109/CVPR.2018.00522 - DOI
-
- Akula A. R., Gella S., Al-Onaizan Y., Zhu S., Reddy S. (2020). Words aren’ t enough, their order matters: on the robustness of grounding visual referring expressions. ArXiv Preprint 2005, 6555–6565. doi: 10.18653/v1/2020.acl-main.586 - DOI
-
- Bailey D. W., Rittenhouse L. R., Hart R. H., Richards R. W. (1989). Characteristics of spatial memory in cattle. Appl. Anim. Behav. Sci. 23, 331–340. doi: 10.1016/0168-1591(89)90101-9 - DOI