Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 24:13:711821.
doi: 10.3389/fpsyg.2022.711821. eCollection 2022.

Direct Human-AI Comparison in the Animal-AI Environment

Affiliations

Direct Human-AI Comparison in the Animal-AI Environment

Konstantinos Voudouris et al. Front Psychol. .

Abstract

Artificial Intelligence is making rapid and remarkable progress in the development of more sophisticated and powerful systems. However, the acknowledgement of several problems with modern machine learning approaches has prompted a shift in AI benchmarking away from task-oriented testing (such as Chess and Go) towards ability-oriented testing, in which AI systems are tested on their capacity to solve certain kinds of novel problems. The Animal-AI Environment is one such benchmark which aims to apply the ability-oriented testing used in comparative psychology to AI systems. Here, we present the first direct human-AI comparison in the Animal-AI Environment, using children aged 6-10 (n = 52). We found that children of all ages were significantly better than a sample of 30 AIs across most of the tests we examined, as well as performing significantly better than the two top-scoring AIs, "ironbar" and "Trrrrr," from the Animal-AI Olympics Competition 2019. While children and AIs performed similarly on basic navigational tasks, AIs performed significantly worse in more complex cognitive tests, including detour tasks, spatial elimination tasks, and object permanence tasks, indicating that AIs lack several cognitive abilities that children aged 6-10 possess. Both children and AIs performed poorly on tool-use tasks, suggesting that these tests are challenging for both biological and non-biological machines.

Keywords: AI benchmarks; Animal-AI Olympics; artificial intelligence; cognitive AI; comparative cognition; human-AI comparison; out-of-distribution testing.

PubMed Disclaimer

Conflict of interest statement

MC, BB and MS are employed by DeepMind Technologies Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
A visual description of the Animal-AI Environment and Testbed. Full details are presented in the Supplementary Material. Images of the Animal-AI Environment and Testbed are licensed under Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0).
Figure 2
Figure 2
Histograms of accuracy averaged across 40 tasks. AIs (left, purple) and children (right, red). The average pass mark across the 40 tasks is shown by the green line. Red/purple solid lines show the probability densities. Red/purple dotted lines show average accuracy.
Figure 3
Figure 3
Boxplots by level and by agent. Levels are in ascending order on the x-axis, with AIs in purple (left hand boxplot of each pair) and children in red (right hand boxplot of each pair). Average pass marks for each level are shown in green.
Figure 4
Figure 4
Density of plot of average score across 40 tasks, by age/agent type. The green line shows the average pass mark across 40 levels.
Figure 5
Figure 5
Boxplots of average accuracy on each level, by age/agent type. The left hand 5 boxplots for each level are the age groups 6–10 respectively, with the rightmost boxplot being the AI group. The green bars show the average pass mark for each level.
Figure 6
Figure 6
UMAP projection onto 2-dimensions using default values of N = 15 and min-dist = 0.1. The labels for AIs correspond to the algorithm name. Age labels are included for children. See the RShinyDash app provided in the Supplementary Material for different parameter settings.
Figure 7
Figure 7
Bonferroni confidence intervals for children’s data at alpha = 0.05 with ‘ironbar’ and ‘Trrrrr’ results and pass marks overlayed.
Figure 8
Figure 8
Different static obstacles in the AAI Testbed. Cuboidal blocks in L1-L5 (Top). Fence-like structures in L6 (bottom). Images of the Animal-AI Environment and Testbed are licensed under Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0).

Similar articles

Cited by

References

    1. Agrawal A., Batra D., Parikh D., Kembhavi A. (2018). Don’t just assume: look and answer: overcoming priors for visual question answering. Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 4971–4980. doi: 10.1109/CVPR.2018.00522 - DOI
    1. Akula A. R., Gella S., Al-Onaizan Y., Zhu S., Reddy S. (2020). Words aren’ t enough, their order matters: on the robustness of grounding visual referring expressions. ArXiv Preprint 2005, 6555–6565. doi: 10.18653/v1/2020.acl-main.586 - DOI
    1. Bailey D. W., Howery L. D., Boss D. L. (2000). Effects of social facilitation for locating feeding sites by cattle in an eight-arm radial maze. Appl. Anim. Behav. Sci. 68, 93–105. doi: 10.1016/S0168-1591(00)00091-5, PMID: - DOI - PubMed
    1. Bailey D. W., Rittenhouse L. R., Hart R. H., Richards R. W. (1989). Characteristics of spatial memory in cattle. Appl. Anim. Behav. Sci. 23, 331–340. doi: 10.1016/0168-1591(89)90101-9 - DOI
    1. Beck S. R., Apperly I. A., Chappell J., Guthrie C., Cutting N. (2011). Making tools isn't child’s play. Cognition 119, 301–306. doi: 10.1016/j.cognition.2011.01.003, PMID: - DOI - PubMed