Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Randomized Controlled Trial
. 2025 Feb 28;20(2):e0316317.
doi: 10.1371/journal.pone.0316317. eCollection 2025.

Comparing the performance of a large language model and naive human interviewers in interviewing children about a witnessed mock-event

Affiliations
Randomized Controlled Trial

Comparing the performance of a large language model and naive human interviewers in interviewing children about a witnessed mock-event

Yongjie Sun et al. PLoS One. .

Abstract

Purpose: The present study compared the performance of a Large Language Model (LLM; ChatGPT) and human interviewers in interviewing children about a mock-event they witnessed.

Methods: Children aged 6-8 (N = 78) were randomly assigned to the LLM (n = 40) or the human interviewer condition (n = 38). In the experiment, the children were asked to watch a video filmed by the researchers that depicted behavior including elements that could be misinterpreted as abusive in other contexts, and then answer questions posed by either an LLM (presented by a human researcher) or a human interviewer.

Results: Irrespective of condition, recommended (vs. not recommended) questions elicited more correct information. The LLM posed fewer questions overall, but no difference in the proportion of the questions recommended by the literature. There were no differences between the LLM and human interviewers in unique correct information elicited but questions posed by LLM (vs. humans) elicited more unique correct information per question. LLM (vs. humans) also elicited less false information overall, but there was no difference in false information elicited per question.

Conclusions: The findings show that the LLM was competent in formulating questions that adhere to best practice guidelines while human interviewers asked more questions following up on the child responses in trying to find out what the children had witnessed. The results indicate LLMs could possibly be used to support child investigative interviewers. However, substantial further investigation is warranted to ascertain the utility of LLMs in more realistic investigative interview settings.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Group Comparison in Question Categories.
Fig 2
Fig 2. Number and Proportion of Information Elicited from Different Types of Questions.

References

    1. Elliott DM, Mok DS, Briere J. Adult sexual assault: prevalence, symptomatology, and sex differences in the general population. J Trauma Stress. 2004;17(3):203–11. doi: 10.1023/B:JOTS.0000029263.11104.23 - DOI - PubMed
    1. Lamb M, Hershkowitz I, Orbach Y, Esplin P. Tell Me What Happened: Structured Investigative Interviews of Child Victims and Witnesses. 2008 Aug 26.
    1. Herman KC, Reinke WM, Parkin J, Traylor KB, Agarwal G. Childhood depression: rethinking the role of the school. Psychol Schools. 2009;46(5):433–46. doi: 10.1002/pits.20388 - DOI
    1. Dale M, Gould JW. Commentary on “Analyzing Child Sexual Abuse Allegations”: Will a new untested criterion-based content analysis model be helpful? J Forensic Psychol P. 2014. Mar 15;14(2):169–82.
    1. Zhang Y, Li S, Zhang Y, Haginoya S, Santtila PO. Effects of combining feedback and hypothesis-testing on the quality of simulated child sexual abuse interviews with avatars among Chinese university students. PLoS One. 2023;18(4):e0285114. doi: 10.1371/journal.pone.0285114 - DOI - PMC - PubMed

Publication types

LinkOut - more resources