. 2022 May 24:13:711821.

doi: 10.3389/fpsyg.2022.711821. eCollection 2022.

Direct Human-AI Comparison in the Animal-AI Environment

Konstantinos Voudouris^{1

2}, Matthew Crosby^{1

3}, Benjamin Beyret^{1

3}, José Hernández-Orallo^{1

4}, Murray Shanahan^{1

3}, Marta Halina^{1

2

5}, Lucy G Cheke^{1

2}

Affiliations

¹ Leverhulme Centre for the Future of Intelligence, University of Cambridge, Cambridge, United Kingdom.
² Department of Psychology, University of Cambridge, Cambridge, United Kingdom, Cambridge, United Kingdom.
³ Department of Computing, Imperial College London, London, United Kingdom.
⁴ Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València, València, Spain.
⁵ Department of History and Philosophy of Science, University of Cambridge, Cambridge, United Kingdom.

PMID: 35686061
PMCID: PMC9172850
DOI: 10.3389/fpsyg.2022.711821

Direct Human-AI Comparison in the Animal-AI Environment

Konstantinos Voudouris et al. Front Psychol. 2022.

. 2022 May 24:13:711821.

doi: 10.3389/fpsyg.2022.711821. eCollection 2022.

Authors

Konstantinos Voudouris^{1

2}, Matthew Crosby^{1

3}, Benjamin Beyret^{1

3}, José Hernández-Orallo^{1

4}, Murray Shanahan^{1

3}, Marta Halina^{1

2

5}, Lucy G Cheke^{1

2}

Affiliations

¹ Leverhulme Centre for the Future of Intelligence, University of Cambridge, Cambridge, United Kingdom.
² Department of Psychology, University of Cambridge, Cambridge, United Kingdom, Cambridge, United Kingdom.
³ Department of Computing, Imperial College London, London, United Kingdom.
⁴ Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València, València, Spain.
⁵ Department of History and Philosophy of Science, University of Cambridge, Cambridge, United Kingdom.

PMID: 35686061
PMCID: PMC9172850
DOI: 10.3389/fpsyg.2022.711821

Abstract

Artificial Intelligence is making rapid and remarkable progress in the development of more sophisticated and powerful systems. However, the acknowledgement of several problems with modern machine learning approaches has prompted a shift in AI benchmarking away from task-oriented testing (such as Chess and Go) towards ability-oriented testing, in which AI systems are tested on their capacity to solve certain kinds of novel problems. The Animal-AI Environment is one such benchmark which aims to apply the ability-oriented testing used in comparative psychology to AI systems. Here, we present the first direct human-AI comparison in the Animal-AI Environment, using children aged 6-10 (n = 52). We found that children of all ages were significantly better than a sample of 30 AIs across most of the tests we examined, as well as performing significantly better than the two top-scoring AIs, "ironbar" and "Trrrrr," from the Animal-AI Olympics Competition 2019. While children and AIs performed similarly on basic navigational tasks, AIs performed significantly worse in more complex cognitive tests, including detour tasks, spatial elimination tasks, and object permanence tasks, indicating that AIs lack several cognitive abilities that children aged 6-10 possess. Both children and AIs performed poorly on tool-use tasks, suggesting that these tests are challenging for both biological and non-biological machines.

Keywords: AI benchmarks; Animal-AI Olympics; artificial intelligence; cognitive AI; comparative cognition; human-AI comparison; out-of-distribution testing.

PubMed Disclaimer

Conflict of interest statement

MC, BB and MS are employed by DeepMind Technologies Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
A visual description of the Animal-AI Environment and Testbed. Full details are presented in the Supplementary Material. Images of the Animal-AI Environment and Testbed are licensed under Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0).

**Figure 2**
Histograms of accuracy averaged across 40 tasks. AIs (**left**, purple) and children (**right**, red). The average pass mark across the 40 tasks is shown by the green line. Red/purple solid lines show the probability densities. Red/purple dotted lines show average accuracy.

**Figure 3**
Boxplots by level and by agent. Levels are in ascending order on the x-axis, with AIs in purple (left hand boxplot of each pair) and children in red (right hand boxplot of each pair). Average pass marks for each level are shown in green.

**Figure 4**
Density of plot of average score across 40 tasks, by age/agent type. The green line shows the average pass mark across 40 levels.

**Figure 5**
Boxplots of average accuracy on each level, by age/agent type. The left hand 5 boxplots for each level are the age groups 6–10 respectively, with the rightmost boxplot being the AI group. The green bars show the average pass mark for each level.

**Figure 6**
UMAP projection onto 2-dimensions using default values of N = 15 and min-dist = 0.1. The labels for AIs correspond to the algorithm name. Age labels are included for children. See the RShinyDash app provided in the Supplementary Material for different parameter settings.

**Figure 7**
Bonferroni confidence intervals for children’s data at alpha = 0.05 with ‘ironbar’ and ‘Trrrrr’ results and pass marks overlayed.

**Figure 8**
Different static obstacles in the AAI Testbed. Cuboidal blocks in L1-L5 (Top). Fence-like structures in L6 (bottom). Images of the Animal-AI Environment and Testbed are licensed under Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0).

See this image and copyright information in PMC

Cited by

The Animal-AI Environment: A virtual laboratory for comparative cognition and artificial intelligence research.
Voudouris K, Slater B, Cheke LG, Schellaert W, Hernández-Orallo J, Halina M, Patel M, Alhas I, Mecattaf MG, Burden J, Holmes J, Chaubey N, Donnelly N, Crosby M. Voudouris K, et al. Behav Res Methods. 2025 Feb 28;57(4):107. doi: 10.3758/s13428-025-02616-3. Behav Res Methods. 2025. PMID: 40021555 Free PMC article.
Exploring the induction and measurement of positive affective state in equines through a personality-centred lens.
Loftus L, Newman A, Leach M, Asher L. Loftus L, et al. Sci Rep. 2025 May 27;15(1):18550. doi: 10.1038/s41598-025-98034-8. Sci Rep. 2025. PMID: 40425817 Free PMC article.
Predictive maps in rats and humans for spatial navigation.
de Cothi W, Nyberg N, Griesbauer EM, Ghanamé C, Zisch F, Lefort JM, Fletcher L, Newton C, Renaudineau S, Bendor D, Grieves R, Duvelle É, Barry C, Spiers HJ. de Cothi W, et al. Curr Biol. 2022 Sep 12;32(17):3676-3689.e5. doi: 10.1016/j.cub.2022.06.090. Epub 2022 Jul 20. Curr Biol. 2022. PMID: 35863351 Free PMC article.

References

1. Agrawal A., Batra D., Parikh D., Kembhavi A. (2018). Don’t just assume: look and answer: overcoming priors for visual question answering. Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 4971–4980. doi: 10.1109/CVPR.2018.00522 - DOI
1. Akula A. R., Gella S., Al-Onaizan Y., Zhu S., Reddy S. (2020). Words aren’ t enough, their order matters: on the robustness of grounding visual referring expressions. ArXiv Preprint 2005, 6555–6565. doi: 10.18653/v1/2020.acl-main.586 - DOI
1. Bailey D. W., Howery L. D., Boss D. L. (2000). Effects of social facilitation for locating feeding sites by cattle in an eight-arm radial maze. Appl. Anim. Behav. Sci. 68, 93–105. doi: 10.1016/S0168-1591(00)00091-5, PMID: - DOI - PubMed
1. Bailey D. W., Rittenhouse L. R., Hart R. H., Richards R. W. (1989). Characteristics of spatial memory in cattle. Appl. Anim. Behav. Sci. 23, 331–340. doi: 10.1016/0168-1591(89)90101-9 - DOI
1. Beck S. R., Apperly I. A., Chappell J., Guthrie C., Cutting N. (2011). Making tools isn't child’s play. Cognition 119, 301–306. doi: 10.1016/j.cognition.2011.01.003, PMID: - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Direct Human-AI Comparison in the Animal-AI Environment

Affiliations

Direct Human-AI Comparison in the Animal-AI Environment

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources