Correspondence of high dimensional emotion structures elicited from video clips between humans and multimodal LLMs
- PMID: 40890212
- PMCID: PMC12402258
- DOI: 10.1038/s41598-025-14961-6
Correspondence of high dimensional emotion structures elicited from video clips between humans and multimodal LLMs
Abstract
Recent studies have revealed that human emotions exhibit a high-dimensional, complex structure. A full capturing of this complexity requires new approaches, as conventional models that disregard high dimensionality risk overlooking key nuances of human emotions. Here, we examined the extent to which the latest generation of rapidly evolving Multimodal Large Language Models (MLLMs) capture these high-dimensional, intricate emotion structures, including capabilities and limitations. Specifically, we compared self-reported emotion ratings from participants watching videos with model-generated estimates (e.g., Gemini or GPT). We evaluated performance not only at the individual video level but also from emotion structures that account for inter-video relationships. At the level of simple correlation between emotion structures, our results demonstrated strong similarity between human and model-inferred emotion structures. To further explore whether the similarity between humans and models is at the signle-item level or the coarse-category level, we applied Gromov-Wasserstein Optimal Transport. We found that although performance was not necessarily high at the strict, single-item level, performance across video categories that elicit similar emotions was substantial, indicating that the model could infer human emotional experiences at the coarse-category level. Our results suggest that current state-of-the-art MLLMs broadly capture the complex high-dimensional emotion structures at the coarse-category level, as well as their apparent limitations in accurately capturing entire structures at the single-item level.
Keywords: Emotion; Emotion structure; Gromov-Wasserstein Optimal Transport; Multimodal Large Language Model; Representational Similarity Analysis; Unsupervised alignment.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Competing interests: The authors declare no competing interests.
Figures







References
-
- Koide-Majima, N., Nakai, T. & Nishimoto, S. Distinct dimensions of emotion in the human brain and their representation on the cortical surface. Neuroimage222, 117258 (2020). - PubMed
-
- Ekman, P. & Friesen, W. V. Constants across cultures in the face and emotion. J. Pers. Soc. Psychol.17, 124–129 (1971). - PubMed
-
- Russell, J. A. A circumplex model of affect. J. Pers. Soc. Psychol.39, 1161–1178 (1980).
-
- Russell, J. A. & Barrett, L. F. Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. J. Pers. Soc. Psychol.76, 805–819 (1999). - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources