. 2024 May 1;7(5):e248895.

doi: 10.1001/jamanetworkopen.2024.8895.

Use of a Large Language Model to Assess Clinical Acuity of Adults in the Emergency Department

Christopher Y K Williams¹, Travis Zack¹, Brenda Y Miao¹, Madhumita Sushil¹, Michelle Wang¹, Aaron E Kornblith^{1

2

3}, Atul J Butte¹

Affiliations

¹ Bakar Computational Health Sciences Institute, University of California, San Francisco.
² Department of Emergency Medicine, University of California, San Francisco.
³ Department of Pediatrics, University of California, San Francisco.

PMID: 38713466
PMCID: PMC11077390
DOI: 10.1001/jamanetworkopen.2024.8895

Use of a Large Language Model to Assess Clinical Acuity of Adults in the Emergency Department

Christopher Y K Williams et al. JAMA Netw Open. 2024.

. 2024 May 1;7(5):e248895.

doi: 10.1001/jamanetworkopen.2024.8895.

Authors

Christopher Y K Williams¹, Travis Zack¹, Brenda Y Miao¹, Madhumita Sushil¹, Michelle Wang¹, Aaron E Kornblith^{1

2

3}, Atul J Butte¹

Affiliations

¹ Bakar Computational Health Sciences Institute, University of California, San Francisco.
² Department of Emergency Medicine, University of California, San Francisco.
³ Department of Pediatrics, University of California, San Francisco.

PMID: 38713466
PMCID: PMC11077390
DOI: 10.1001/jamanetworkopen.2024.8895

Abstract

Importance: The introduction of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4; OpenAI), has generated significant interest in health care, yet studies evaluating their performance in a clinical setting are lacking. Determination of clinical acuity, a measure of a patient's illness severity and level of required medical attention, is one of the foundational elements of medical reasoning in emergency medicine.

Objective: To determine whether an LLM can accurately assess clinical acuity in the emergency department (ED).

Design, setting, and participants: This cross-sectional study identified all adult ED visits from January 1, 2012, to January 17, 2023, at the University of California, San Francisco, with a documented Emergency Severity Index (ESI) acuity level (immediate, emergent, urgent, less urgent, or nonurgent) and with a corresponding ED physician note. A sample of 10 000 pairs of ED visits with nonequivalent ESI scores, balanced for each of the 10 possible pairs of 5 ESI scores, was selected at random.

Exposure: The potential of the LLM to classify acuity levels of patients in the ED based on the ESI across 10 000 patient pairs. Using deidentified clinical text, the LLM was queried to identify the patient with a higher-acuity presentation within each pair based on the patients' clinical history. An earlier LLM was queried to allow comparison with this model.

Main outcomes and measures: Accuracy score was calculated to evaluate the performance of both LLMs across the 10 000-pair sample. A 500-pair subsample was manually classified by a physician reviewer to compare performance between the LLMs and human classification.

Results: From a total of 251 401 adult ED visits, a balanced sample of 10 000 patient pairs was created wherein each pair comprised patients with disparate ESI acuity scores. Across this sample, the LLM correctly inferred the patient with higher acuity for 8940 of 10 000 pairs (accuracy, 0.89 [95% CI, 0.89-0.90]). Performance of the comparator LLM (accuracy, 0.84 [95% CI, 0.83-0.84]) was below that of its successor. Among the 500-pair subsample that was also manually classified, LLM performance (accuracy, 0.88 [95% CI, 0.86-0.91]) was comparable with that of the physician reviewer (accuracy, 0.86 [95% CI, 0.83-0.89]).

Conclusions and relevance: In this cross-sectional study of 10 000 pairs of ED visits, the LLM accurately identified the patient with higher acuity when given pairs of presenting histories extracted from patients' first ED documentation. These findings suggest that the integration of an LLM into ED workflows could enhance triage processes while maintaining triage quality and warrants further investigation.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Ms Miao reported receiving personal fees from SandboxAQ outside the submitted work. Dr Kornblith reported being a cofounder of Capture Diagnostics LLC outside the submitted work. Dr Butte reported being a cofounder of and consulting for Personalis Inc and NuMedii Inc; consulting for Mango Tree Corp, Samsung Electronics Co Ltd, 10x Genomics Inc, Helix Inc, Pathway Genomics, and Verinata Health Inc (Illumina Inc); serving on paid advisory panels or boards for Geisinger Health, Regenstrief Institute, Gerson Lehman Group, AlphaSights, Covance, Novartis AG, Genentech Inc, Merck & Co Inc, and Roche; being a shareholder of Personalis Inc and NuMedii Inc; being a minor shareholder of Apple Inc, Meta (Facebook), Alphabet Inc (Google), Microsoft Corp, Amazon, Snap Inc, 10x Genomics Inc, Illumina Inc, Regeneron Pharmaceuticals Inc, Sanofi SA, Pfizer Inc, Royalty Pharma PLC, Moderna Inc, Sutro Biopharma Inc, Doximity, BioNTech SA, Invitae Corp, Pacific Biosciences of California Inc, Editas Medicine Inc, Nuna Inc, Assay Depot, Vet24seven Inc, Sophia Genetics, Allbirds Inc, Coursera Plus, DigitalOcean Holdings Inc, Rivian Automotive Inc, Snowflake Inc, Netflix Inc, Starbucks Corp, Advanced Micro Devices Inc, Tesla Inc, Personalis Inc, and Eli Lilly and Co; receiving honoraria and travel reimbursement for invited talks from Johnson & Johnson, Roche, Genentech Inc, Pfizer Inc, Merck & Co Inc, Eli Lilly and Co Inc, Takeda Pharmaceutical Co, Varian Medical Systems, Mars Therapeutics Private Limited, Siemens AG, Optum Inc, Abbott Laboratories, Celgene Corp, AstraZeneca, AbbVie Inc, Westat, Boston Children’s Hospital, The Johns Hopkins University, Endocrine Society, Alliance for Academic Internal Medicine, Children’s Hospital of Philadelphia, University of Pittsburgh Medical Center, Cleveland Clinic, University of Utah, Society of Toxicology, Mayo Clinic, Oracle Cerner, and the Transplantation Society; receiving royalty payments through Stanford University for several patents and other disclosures licensed to NuMedii Inc and Personalis Inc; and receiving research funding from the National Institutes of Health, Peraton Inc, Genentech Inc, Johnson & Johnson, the US Food and Drug Administration, the Robert Wood Johnson Foundation, the Leon Lowenstein Foundation, the Intervalien Foundation, Priscilla Chan and Mark Zuckerberg, the Barbara and Gerson Bakar Foundation, the March of Dimes, the Juvenile Diabetes Research Foundation, the California Governor’s Office of Planning and Research, the California Institute for Regenerative Medicine, L’Oréal SA, and Progenity. No other disclosures were reported.

Figures

**Figure 1.. Flowchart of Included Emergency Department (ED) Visits**
ESI indicates Emergency Severity Index (immediate, emergent, urgent, less urgent, and nonurgent). ^aA balanced sample of 10 000 patient pairs was created from the full sample wherein each pair comprised patients with the following disparate ESI acuity scores: immediate/emergent (n = 1000); immediate/urgent (n = 1000); immediate/less urgent (n = 1000); immediate/nonurgent (n = 1000); emergent/urgent (n = 1000); emergent/less urgent (n = 1000); emergent/nonurgent (n = 1000); urgent/less urgent (n = 1000); urgent/nonurgent (n = 1000); less urgent/nonurgent (n = 1000).

**Figure 2.. Comparison of Large Language Model (LLM) and Physician Performance**
Evaluated for each type of Emergency Severity Index (ESI) acuity level pairing in the 500-pair subsample (immediate, emergent, urgent, less urgent, and nonurgent). Overall LLM accuracy was 0.88 (95% CI, 0.86-0.91); overall physician accuracy, 0.86 (95% CI, 0.83-0.89). Error bars indicate 95% CIs.

**Figure 3.. Comparison of Comparator Large Language Model (LLM) and Physician Performance**
Evaluated for each type of Emergency Severity Index (ESI) acuity level pairing in the 500-pair subsample (immediate, emergent, urgent, less urgent, and nonurgent). Overall comparator LLM accuracy was 0.84 (95% CI, 0.81-0.88); overall physician accuracy, 0.86 (95% CI, 0.83-0.89). Error bars indicate 95% CIs.

See this image and copyright information in PMC

Comment in

Artificial Intelligence for Emergency Care Triage-Much Promise, but Still Much to Learn.
Friedman AB, Delgado MK, Weissman GE. Friedman AB, et al. JAMA Netw Open. 2024 May 1;7(5):e248857. doi: 10.1001/jamanetworkopen.2024.8857. JAMA Netw Open. 2024. PMID: 38713470 No abstract available.

References

1. OpenAI . Introducing ChatGPT. Accessed March 18, 2023. https://openai.com/blog/chatgpt
1. OpenAI, Achiam J, Adler S, et al. . GPT-4 technical report. arXiv. Preprint posted online March 27, 2023. doi:10.48550/arXiv.2303.08774 - DOI
1. Kung TH, Cheatham M, Medenilla A, et al. . Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. doi:10.1371/journal.pdig.0000198 - DOI - PMC - PubMed
1. Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA. 2023;330(1):78-80. doi:10.1001/jama.2023.8288 - DOI - PMC - PubMed
1. Ilgen JS, Humbert AJ, Kuhn G, et al. . Assessing diagnostic reasoning: a consensus statement summarizing theory, practice, and future needs. Acad Emerg Med. 2012;19(12):1454-1461. doi:10.1111/acem.12034 - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

K23 HD110716/HD/NICHD NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Use of a Large Language Model to Assess Clinical Acuity of Adults in the Emergency Department

Affiliations

Use of a Large Language Model to Assess Clinical Acuity of Adults in the Emergency Department

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources