. 2023 Feb 7;120(6):e2218523120.

doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.

Using cognitive psychology to understand GPT-3

Marcel Binz¹, Eric Schulz¹

Affiliations

PMID: 36730192
PMCID: PMC9963545
DOI: 10.1073/pnas.2218523120

Using cognitive psychology to understand GPT-3

Marcel Binz et al. Proc Natl Acad Sci U S A. 2023.

. 2023 Feb 7;120(6):e2218523120.

doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.

Authors

Marcel Binz¹, Eric Schulz¹

Affiliation

¹ Max Planck Research Group (MPRG) Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Tübingen 72076, Germany.

PMID: 36730192
PMCID: PMC9963545
DOI: 10.1073/pnas.2218523120

Abstract

We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3's decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3's behavior is impressive: It solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multiarmed bandit task, and shows signatures of model-based reinforcement learning. Yet, we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. Taken together, these results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.

Keywords: artificial intelligence; cognitive psychology; decision-making; language models; reasoning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 1.**
Vignette-based tasks. (A) Example prompt of a hypothetical scenario, in this case, the famous Linda problem, as submitted to GPT-3. (B) Results. While in 12 out 12 standard vignettes, GPT-3 answers either correctly or makes human-like mistakes, it makes mistakes that are not human-like when given the adversarial vignettes.

**Fig. 2.**
Decisions from descriptions. (A) Example prompt of a problem provided to GPT-3. (B) Example prompt of a problem provided to GPT-3. (C) Mean regret averaged over all 13,000 problems taken from Peterson et al. (23). Lower regret means better performance. Error bars indicate the SE of the mean. (D) Log-odds ratios of different contrasts used to test for cognitive biases. Positive values indicate that the given bias is present in humans (circle) or GPT-3 (triangle). Human data adapted from Ruggeri et al. (24).

**Fig. 3.**
Horizon task. (A) Visual overview of the horizon task paradigm. Each column pair corresponds to one example task. (B) Example prompt for one trial as submitted to GPT-3. (C) Mean regret for GPT-3 and human subjects by horizon condition. Lower regret means better performance. Error bars indicate the SE of the mean. Human data taken from Zaller et al. (29).

**Fig. 4.**
Two-step task. (A) Visual overview of the two-step task paradigm. (B) Example prompt of one trial in the canonical two-step task as submitted to GPT-3. (C) Model-free learning in dependency of rewarded and unrewarded as well as common and rare transitions. (D) Model-based learning in dependency of rewarded and unrewarded as well as common and rare transitions. (E) Human behavior in dependency of rewarded and unrewarded as well as common and rare transitions. Human data adapted from Daw et al. (30). (F) GPT-3’s behavior in dependency of rewarded and unrewarded as well as common and rare transitions. Error bars indicate the SE of the mean.

**Fig. 5.**
Causal reasoning. (A) Example prompt for the causal reasoning task adapted from Waldmann and Hagmayer (31). (B) GPT-3’s responses alongside responses of people and an ideal agent in the common-cause condition. (C) GPT-3’s responses alongside responses of people and an ideal agent in the causal-chain condition.

**Fig. 6.**
Prompt variations. (A) Performance for different prompt variations in the decisions from the descriptions paradigm. (B) Performance for different prompt variations in the horizon task. (C) Effect of random exploration for different prompt variations in the horizon task. (D) Effect of directed exploration for different prompt variations in the horizon task. (E) GPT-3’s behavior in dependency of rewarded and unrewarded as well as common and rare transitions for the alien cover story (reproduced from Fig. 4F). (F). GPT-3’s behavior in dependency of rewarded and unrewarded as well as common and rare transitions for the magical carpet cover story. Error bars indicate the SE of the mean.

See this image and copyright information in PMC

Comment in

Probing the psychology of AI models.
Shiffrin R, Mitchell M. Shiffrin R, et al. Proc Natl Acad Sci U S A. 2023 Mar 7;120(10):e2300963120. doi: 10.1073/pnas.2300963120. Epub 2023 Mar 1. Proc Natl Acad Sci U S A. 2023. PMID: 36857344 Free PMC article. No abstract available.

References

1. D. Gunning et al., XAI–explainable artificial intelligence. Sci. Rob. 4, eaay7120 (2019). - PubMed
1. Brown T., et al. , Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
1. M. Chen et al., Evaluating large language models trained on code. arXiv [Preprint] (2021).http://arxiv.org/abs/2107.03374 (Accessed 20 January 2023).
1. Lin Z., et al. , Caire: An end-to-end empathetic chatbot. Proceedings of the AAAI Conference on Artificial Intelligence 34, 13622–13623 (2020).
1. D. Noever, M. Ciolino, J. Kalin, The chess transformer: Mastering play using generative language models. arXiv [Preprint] (2020). http://arxiv.org/abs/2008.04057 (Accessed 20 January 2023).

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using cognitive psychology to understand GPT-3

Affiliation

Using cognitive psychology to understand GPT-3

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources