Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 6:11:e54369.
doi: 10.2196/54369.

Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study

Affiliations

Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study

Zohar Elyoseph et al. JMIR Ment Health. .

Abstract

Background: Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one's own and others' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard's existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted.

Objective: The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities.

Methods: The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard.

Results: ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard's performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent.

Conclusions: ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard's capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.

Keywords: AI; ChatGPT; LLM; LLMs; RMET; Reading the Mind in the Eyes Test; algorithm; algorithms; artificial intelligence; early detection; early warning; emotional awareness; emotional comprehension; emotional cue; emotional cues; empathy; large language model; large language models; machine learning; mental disease; mental diseases; mental health; mental illness; mental illnesses; mentalization; mentalizing; practical model; practical models; predictive analytics; predictive model; predictive models; predictive system.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
An example of ChatGPT-4 and Google Bard responses for Reading the Mind in the Eyes Test items. (A and B) ChatGPT-4 generates correct responses for both items and (C) Google Bard generates incorrect responses (the correct response was “regretful”).

References

    1. Freeman C. What is mentalizing? an overview. Brit J Psychotherapy. 2016;32(2):189–201. doi: 10.1111/bjp.12220. - DOI
    1. Aival-Naveh E, Rothschild-Yakar L, Kurman J. Keeping culture in mind: a systematic review and initial conceptualization of mentalizing from a cross-cultural perspective. Clin Psychol (New York) 2019;26(4):25. doi: 10.1037/h0101757. - DOI
    1. Schwarzer NH, Nolte T, Fonagy P, Gingelmaier S. Mentalizing and emotion regulation: evidence from a nonclinical sample. Int Forum Psychoanal. 2021;30(1):34–45. doi: 10.1080/0803706x.2021.1873418. - DOI
    1. Lane RD, Quinlan DM, Schwartz GE, Walker PA, Zeitlin SB. The levels of emotional awareness scale: a cognitive-developmental measure of emotion. J Pers Assess. 1990;55(1-2):124–134. doi: 10.1080/00223891.1990.9674052. - DOI - PubMed
    1. Baron-Cohen S, Wheelwright S, Hill J, Raste Y, Plumb I. The "Reading the Mind in the Eyes" test revised version: a study with normal adults, and adults with Asperger syndrome or high-functioning autism. J Child Psychol Psychiatry. 2001;42(2):241–251. - PubMed

LinkOut - more resources