Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses

B Arslan¹, C Nuhoglu², M O Satici², E Altinbilek²

Affiliations

¹ Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey. Electronic address: dr.banuarslan@gmail.com.
² Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

PMID: 39731895
DOI: 10.1016/j.ajem.2024.12.024

Observational Study

Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses

B Arslan et al. Am J Emerg Med. 2025 Mar.

. 2025 Mar:89:174-181.

doi: 10.1016/j.ajem.2024.12.024. Epub 2024 Dec 19.

Authors

B Arslan¹, C Nuhoglu², M O Satici², E Altinbilek²

Affiliations

¹ Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey. Electronic address: dr.banuarslan@gmail.com.
² Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

PMID: 39731895
DOI: 10.1016/j.ajem.2024.12.024

Abstract

Background: The number of emergency department (ED) visits has been on steady increase globally. Artificial Intelligence (AI) technologies, including Large Language Model (LLMs)-based generative AI models, have shown promise in improving triage accuracy. This study evaluates the performance of ChatGPT and Copilot in triage at a high-volume urban hospital, hypothesizing that these tools can match trained physicians' accuracy and reduce human bias amidst ED crowding challenges.

Methods: This single-center, prospective observational study was conducted in an urban ED over one week. Adult patients were enrolled through random 24-h intervals. Exclusions included minors, trauma cases, and incomplete data. Triage nurses assessed patients while an emergency medicine (EM) physician documented clinical vignettes and assigned emergency severity index (ESI) levels. These vignettes were then introduced to ChatGPT and Copilot for comparison with the triage nurse's decision.

Results: The overall triage accuracy was 65.2 % for nurses, 66.5 % for ChatGPT, and 61.8 % for Copilot, with no significant difference (p = 0.000). Moderate agreement was observed between the EM physician and ChatGPT, triage nurses, and Copilot (Cohen's Kappa = 0.537, 0.477, and 0.472, respectively). In recognizing high-acuity patients, ChatGPT and Copilot outperformed triage nurses (87.8 % and 85.7 % versus 32.7 %, respectively). Compared to ChatGPT and Copilot, nurses significantly under-triaged patients (p < 0.05). The analysis of predictive performance for ChatGPT, Copilot, and triage nurses demonstrated varying discrimination abilities across ESI levels, all of which were statistically significant (p < 0.05). ChatGPT and Copilot exhibited consistent accuracy across age, gender, and admission time, whereas triage nurses were more likely to mistriage patients under 45 years old.

Conclusion: ChatGPT and Copilot outperform traditional nurse triage in identifying high-acuity patients, but real-time ED capacity data is crucial to prevent overcrowding and ensure high-quality of emergency care.

Keywords: ChatGPT; Copilot; Emergency medicine; Emergency severity index; Generative artificial intelligence; Large language models; Triage.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that there is no conflict of interest.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses

Affiliations

Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses

Authors

Affiliations

Abstract

Conflict of interest statement

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources