Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 19;3(10):pgae418.
doi: 10.1093/pnasnexus/pgae418. eCollection 2024 Oct.

Large language models and humans converge in judging public figures' personalities

Affiliations

Large language models and humans converge in judging public figures' personalities

Xubo Cao et al. PNAS Nexus. .

Abstract

ChatGPT-4 and 600 human raters evaluated 226 public figures' personalities using the Ten-Item Personality Inventory. The correlation between ChatGPT-4 and aggregate human ratings ranged from r = 0.76 to 0.87, outperforming the models specifically trained to make such predictions. Notably, the model was not provided with any training data or feedback on its performance. We discuss the potential explanations and practical implications of ChatGPT-4's ability to mimic human responses accurately.

Keywords: AI; large language models; personality perception; zero-shot predictions.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Accuracy of ChatGPT-4's (blue bars) TIPI ratings of public figures’ personalities. Accuracy was averaged across 10 runs, applying Fisher's Z-transformation. The accuracy of embedding-based regression models observed in previous research (8) is provided for context (orange bars). In both cases, aggregate human ratings served as the ground truth. Error bars represent 95% CIs. All correlations are significant at P < 0.001.
Fig. 2.
Fig. 2.
The profile similarity (Pearson correlation) between human and ChatGPT-4 ratings of each public figure as a function of Wikipedia page views (log-transformed), a proxy for public figures’ popularity. One outlier with extreme negative similarity was omitted. These two variables correlated at r = 0.15 (P < 0.05).

References

    1. Digutsch J, Kosinski M. 2023. Overlap in meaning is a stronger predictor of semantic activation in GPT-3 than in humans. Sci Rep. 13:5035. - PMC - PubMed
    1. Brown TB, et al. 2020. Language models are few-shot learners. arXiv, arXiv:2005.14165v4, preprint: not peer reviewed.
    1. Hagendorff T, Fabi S, Kosinski M. 2023. Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nat Comput Sci. 3:833–838. - PMC - PubMed
    1. Kosinski M. in press. Evaluating large language models in theory of mind tasks. Proc Natl Acad Sci U S A. - PMC - PubMed
    1. Wei J, et al. 2022. Emergent abilities of large language models. arXiv, arXiv:2206.07682v2, preprint: not peer reviewed.

LinkOut - more resources