. 2025 Jul 3;13(7):80.

doi: 10.3390/jintelligence13070080.

The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs

Martin Op 't Hof¹, Ke Hu², Song Tong^{3

4}, Honghong Bai⁵

Affiliations

¹ School of Articifical Intelligence, Radboud University, 6500 HE Nijmegen, The Netherlands.
² Department of Psychological and Cognitive Sciences, Tsinghua University, Beijing 100084, China.
³ Department of Psychology, Faculty of Arts and Sciences, Beijing Normal University at Zhuhai, Zhuhai 519087, China.
⁴ Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education, Faculty of Psychology, Beijing Normal University, Beijing 100875, China.
⁵ Behavioural Science Institute & Orthopedagogics: Learning and Development, Radboud University, 6500 HE Nijmegen, The Netherlands.

PMID: 40710813
PMCID: PMC12295035
DOI: 10.3390/jintelligence13070080

The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs

Martin Op 't Hof et al. J Intell. 2025.

. 2025 Jul 3;13(7):80.

doi: 10.3390/jintelligence13070080.

Authors

Martin Op 't Hof¹, Ke Hu², Song Tong^{3

4}, Honghong Bai⁵

Affiliations

¹ School of Articifical Intelligence, Radboud University, 6500 HE Nijmegen, The Netherlands.
² Department of Psychological and Cognitive Sciences, Tsinghua University, Beijing 100084, China.
³ Department of Psychology, Faculty of Arts and Sciences, Beijing Normal University at Zhuhai, Zhuhai 519087, China.
⁴ Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education, Faculty of Psychology, Beijing Normal University, Beijing 100875, China.
⁵ Behavioural Science Institute & Orthopedagogics: Learning and Development, Radboud University, 6500 HE Nijmegen, The Netherlands.

PMID: 40710813
PMCID: PMC12295035
DOI: 10.3390/jintelligence13070080

Abstract

Current research predominantly involves human subjects to evaluate AI creativity. In this explorative study, we questioned the validity of this practice and examined how creator-assessor (dis)similarity-namely to what extent the creator and the assessor were alike-along two dimensions of culture (Western and English-speaking vs. Eastern and Chinese-speaking) and agency (human vs. AI) influences the assessment of creativity. We first asked four types of subjects to create stories, including Eastern participants (university students from China), Eastern AI (Kimi from China), Western participants (university students from The Netherlands), and Western AI (ChatGPT 3.5 from the US). Both Eastern participants and AI created stories in Chinese, which were then translated into English, while both Western participants and AI created stories in English, which were then translated into Chinese. A subset of these stories (2 creative and 2 uncreative per creator type, in total 16 stories) was then randomly selected as assessment materials. Adopting a within-subject design, we then asked new subjects from the same four types (n = 120, 30 per type) to assess these stories on creativity, originality, and appropriateness. The results confirmed that similarities in both dimensions of culture and agency influence the assessment of originality and appropriateness. As for the agency dimension, human assessors preferred human-created stories for originality, while AI assessors showed no preference. Conversely, AI assessors rated AI-generated stories higher in appropriateness, whereas human assessors showed no preference. Culturally, both Eastern and Western assessors favored Eastern-created stories in originality. And as for appropriateness, the assessors always preferred stories from the creators with the same cultural backgrounds. The present study is significant in attempting to ask an often-overlooked question and provides the first empirical evidence to underscore the need for more discussion on using humans to judge AI agents' creativity or the other way around.

Keywords: creativity assessment; cross-cultural comparison; large language models.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Subject types match each other by the agentic and cultural dimensions.

**Figure 2**
Mean with 95% confidence interval (**left panel**) and distribution (**right panel**) of creativity, originality, and appropriateness scores.

**Figure 3**
Visualization of the interaction effects between creator–assessor agentic similarity and assessor agency type, separately for originality (**left**) and appropriateness (**right**).

**Figure 4**
Visualization of the interaction effects between creator–assessor cultural similarity and assessor culture type for originality.

See this image and copyright information in PMC

References

1. Agarwal Dhruv, Naaman Mor, Vashistha Aditya. AI suggestions homogenize writing toward Western styles and diminish cultural nuances; Paper presented at 2025 CHI Conference on Human Factors in Computing Systems. Yokohama; Japan. April 26–May 1; 2025.
1. Arendt Hannah. The Human Condition. University of Chicago Press; Chicago: 2013.
1. Bai Honghong, Chan Lukshu, Luo Aoxin, Kroesbergen Evelyn, Christie Stella. Creativity in dialogues: How do children interact with parents vs. with strangers for generating creative ideas? PsyArXiv. 2024 doi: 10.31234/osf.io/mvxka. - DOI
1. Beaty Roger E., Johnson Dan R. Automating creativity assessment with semdis: An open platform for computing semantic distance. Behavior Research Methods. 2021;53:757–80. doi: 10.3758/s13428-020-01453-w. - DOI - PMC - PubMed
1. Bommasani Rishi, Creel Kathleen A., Kumar Ananya, Jurafsky Dan, Liang Percy. Advances in Neural Information Processing Systems. Curran Associates Inc; Red Hook: 2022. Picking on the same person: Does algorithmic monoculture lead to outcome homogenization? pp. 3663–78.

Grants and funding

No. 2024-06-18-LXHT002; 2024-09-23-LXHT008/Self-Funded Project of the Institute for Global Industry, Tsinghua University

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs

Affiliations

The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials