Comparative Study

. 2025 Jul 17:13:e68409.

doi: 10.2196/68409.

Clinical Performance and Communication Skills of ChatGPT Versus Physicians in Emergency Medicine: Simulated Patient Study

ChulHyoung Park^#^{1

2

3}, Min Ho An^#^{1

2

3}, Gyubeom Hwang^#^{1

2

3}, Rae Woong Park^{1

2

3

4}, Juho An⁵

Affiliations

¹ Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea.
² Center for Biomedical Informatics Research, Ajou University Medical Cencer, Suown, Republic of Korea.
³ Department of Medical Sciences, Graduate School of Ajou University, Suwon, Republic of Korea.
⁴ BK21 R&E Initiative for Advanced Precision Medicine, Suwon, Republic of Korea.
⁵ Department of Emergency Medicine, Ajou University School of Medicine, 164 Worldcup-ro, Yeongtong-gu, Suwon, 16499, Republic of Korea, 82 0312195016.

^# Contributed equally.

PMID: 40674718
PMCID: PMC12289221
DOI: 10.2196/68409

Comparative Study

Clinical Performance and Communication Skills of ChatGPT Versus Physicians in Emergency Medicine: Simulated Patient Study

ChulHyoung Park et al. JMIR Med Inform. 2025.

. 2025 Jul 17:13:e68409.

doi: 10.2196/68409.

Authors

ChulHyoung Park^#^{1

2

3}, Min Ho An^#^{1

2

3}, Gyubeom Hwang^#^{1

2

3}, Rae Woong Park^{1

2

3

4}, Juho An⁵

Affiliations

¹ Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea.
² Center for Biomedical Informatics Research, Ajou University Medical Cencer, Suown, Republic of Korea.
³ Department of Medical Sciences, Graduate School of Ajou University, Suwon, Republic of Korea.
⁴ BK21 R&E Initiative for Advanced Precision Medicine, Suwon, Republic of Korea.
⁵ Department of Emergency Medicine, Ajou University School of Medicine, 164 Worldcup-ro, Yeongtong-gu, Suwon, 16499, Republic of Korea, 82 0312195016.

^# Contributed equally.

PMID: 40674718
PMCID: PMC12289221
DOI: 10.2196/68409

Abstract

Background: Emergency medicine can benefit from artificial intelligence (AI) due to its unique challenges, such as high patient volume and the need for urgent interventions. However, it remains difficult to assess the applicability of AI systems to real-world emergency medicine practice, which requires not only medical knowledge but also adaptable problem-solving and effective communication skills.

Objective: We aimed to evaluate ChatGPT's (OpenAI) performance in comparison to human doctors in simulated emergency medicine settings, using the framework of clinical performance examination and written examinations.

Methods: In total, 12 human doctors were recruited to represent the medical professionals. Both ChatGPT and the human doctors were instructed to manage each case like real clinical settings with 12 simulated patients. After the clinical performance examination sessions, the conversation records were evaluated by an emergency medicine professor on history taking, clinical accuracy, and empathy on a 5-point Likert scale. Simulated patients completed a 5-point scale survey including overall comprehensibility, credibility, and concern reduction for each case. In addition, they evaluated whether the doctor they interacted with was similar to a human doctor. An additional evaluation was performed using vignette-based written examinations to assess diagnosis, investigation, and treatment planning. The mean scores from ChatGPT were then compared with those of the human doctors.

Results: ChatGPT scored significantly higher than the physicians in both history-taking (mean score 3.91, SD 0.67 vs mean score 2.67, SD 0.78, P<.001) and empathy (mean score 4.50, SD 0.67 vs mean score 1.75, SD 0.62, P<.001). However, there was no significant difference in clinical accuracy. In the survey conducted with simulated patients, ChatGPT scored higher for concern reduction (mean score 4.33, SD 0.78 vs mean score 3.58, SD 0.90, P=.04). For comprehensibility and credibility, ChatGPT showed better performance, but the difference was not significant. In the similarity assessment score, no significant difference was observed (mean score 3.50, SD 1.78 vs mean score 3.25, SD 1.86, P=.71).

Conclusions: ChatGPT's performance highlights its potential as a valuable adjunct in emergency medicine, demonstrating comparable proficiency in knowledge application, efficiency, and empathetic patient interaction. These results suggest that a collaborative health care model, integrating AI with human expertise, could enhance patient care and outcomes.

Keywords: ChatGPT; artificial intelligence; clinical performance examination; clinical reasoning; emergency medicine; empathy; history taking; large language model; patient experience.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1.. Flow diagram of the study.**

**Figure 2.. Difference of accuracy and completeness scores of ChatGPT and physicians from written examination.**

**Figure 3.. Difference in accuracy and completeness scores of ChatGPT and physicians categorized by difficulty level of the questions.**

See this image and copyright information in PMC

References

1. Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med. 2023 Mar 30;388(13):1233–1239. doi: 10.1056/NEJMsr2214184. doi. Medline. - DOI - PubMed
1. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023 Feb;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. doi. Medline. - DOI - PMC - PubMed
1. Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA. 2023 Mar 14;329(10):842–844. doi: 10.1001/jama.2023.1044. doi. Medline. - DOI - PMC - PubMed
1. Yeo YH, Samaan JS, Ng WH, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023 Jul;29(3):721–732. doi: 10.3350/cmh.2023.0089. doi. - DOI - PMC - PubMed
1. Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023 Jun 1;183(6):589–596. doi: 10.1001/jamainternmed.2023.1838. doi. Medline. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Clinical Performance and Communication Skills of ChatGPT Versus Physicians in Emergency Medicine: Simulated Patient Study

Affiliations

Clinical Performance and Communication Skills of ChatGPT Versus Physicians in Emergency Medicine: Simulated Patient Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources