Large language models and humans converge in judging public figures' personalities

Xubo Cao¹, Michal Kosinski¹

Affiliations

PMID: 39359393
PMCID: PMC11443023
DOI: 10.1093/pnasnexus/pgae418

Large language models and humans converge in judging public figures' personalities

Xubo Cao et al. PNAS Nexus. 2024.

. 2024 Sep 19;3(10):pgae418.

doi: 10.1093/pnasnexus/pgae418. eCollection 2024 Oct.

Authors

Xubo Cao¹, Michal Kosinski¹

Affiliation

¹ Graduate School of Business, Stanford University, Stanford, CA 94305, USA.

PMID: 39359393
PMCID: PMC11443023
DOI: 10.1093/pnasnexus/pgae418

Abstract

ChatGPT-4 and 600 human raters evaluated 226 public figures' personalities using the Ten-Item Personality Inventory. The correlation between ChatGPT-4 and aggregate human ratings ranged from r = 0.76 to 0.87, outperforming the models specifically trained to make such predictions. Notably, the model was not provided with any training data or feedback on its performance. We discuss the potential explanations and practical implications of ChatGPT-4's ability to mimic human responses accurately.

Keywords: AI; large language models; personality perception; zero-shot predictions.

PubMed Disclaimer

Figures

**Fig. 1.**
Accuracy of ChatGPT-4's (blue bars) TIPI ratings of public figures’ personalities. Accuracy was averaged across 10 runs, applying Fisher's Z-transformation. The accuracy of embedding-based regression models observed in previous research (8) is provided for context (orange bars). In both cases, aggregate human ratings served as the ground truth. Error bars represent 95% CIs. All correlations are significant at P < 0.001.

**Fig. 2.**
The profile similarity (Pearson correlation) between human and ChatGPT-4 ratings of each public figure as a function of Wikipedia page views (log-transformed), a proxy for public figures’ popularity. One outlier with extreme negative similarity was omitted. These two variables correlated at r = 0.15 (P < 0.05).

See this image and copyright information in PMC

References

1. Digutsch J, Kosinski M. 2023. Overlap in meaning is a stronger predictor of semantic activation in GPT-3 than in humans. Sci Rep. 13:5035. - PMC - PubMed
1. Brown TB, et al. 2020. Language models are few-shot learners. arXiv, arXiv:2005.14165v4, preprint: not peer reviewed.
1. Hagendorff T, Fabi S, Kosinski M. 2023. Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nat Comput Sci. 3:833–838. - PMC - PubMed
1. Kosinski M. in press. Evaluating large language models in theory of mind tasks. Proc Natl Acad Sci U S A. - PMC - PubMed
1. Wei J, et al. 2022. Emergent abilities of large language models. arXiv, arXiv:2206.07682v2, preprint: not peer reviewed.

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large language models and humans converge in judging public figures' personalities

Affiliation

Large language models and humans converge in judging public figures' personalities

Authors

Affiliation

Abstract

Figures

References

LinkOut - more resources

Full Text Sources