Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 19;9(3):e0638.
doi: 10.1097/HC9.0000000000000638. eCollection 2025 Mar 1.

Automated identification of incidental hepatic steatosis on Emergency Department imaging using large language models

Affiliations

Automated identification of incidental hepatic steatosis on Emergency Department imaging using large language models

Tyrus Vong et al. Hepatol Commun. .

Abstract

Background: Hepatic steatosis is a precursor to more severe liver disease, increasing morbidity and mortality risks. In the Emergency Department, routine abdominal imaging often reveals incidental hepatic steatosis that goes undiagnosed due to the acute nature of encounters. Imaging reports in the electronic health record contain valuable information not easily accessible as discrete data elements. We hypothesized that large language models could reliably detect hepatic steatosis from reports without extensive natural language processing training.

Methods: We identified 200 adults who had CT abdominal imaging in the Emergency Department between August 1, 2016, and December 31, 2023. Using text from imaging reports and structured prompts, 3 Azure OpenAI models (ChatGPT 3.5, 4, 4o) identified patients with hepatic steatosis. We evaluated model performance regarding accuracy, inter-rater reliability, sensitivity, and specificity compared to physician reviews.

Results: The accuracy for the models was 96.2% for v3.5, 98.3% for v4, and 98.8% for v4o. Inter-rater reliability ranged from 0.99 to 1.00 across 10 iterations. Mean model confidence scores were 2.9 (SD 0.8) for v3.5, 3.9 (SD 0.3) for v4, and 4.0 (SD 0.07) for v4o. Incorrect evaluations were 76 (3.8%) for v3.5, 34 (1.7%) for v4, and 25 (1.3%) for v4o. All models showed sensitivity and specificity above 0.9.

Conclusions: Large language models can assist in identifying incidental conditions from imaging reports that otherwise may be missed opportunities for early disease intervention. Large language models are a democratization of natural language processing by allowing for a user-friendly, expansive analyses of electronic medical records without requiring the development of complex natural language processing models.

PubMed Disclaimer

Conflict of interest statement

Mark Dredze consults for Bloomberg LP and Good Analytics. Jeremiah S. Hinson is employed by Beckman Coulter. The remaining authors have no conflicts to report.

Figures

FIGURE 1
FIGURE 1
Prompt ChatGPT to evaluate whether a radiology report was suggestive of hepatic steatosis.
FIGURE 2
FIGURE 2
Evaluation of the accuracy of GPT models.
FIGURE 3
FIGURE 3
Inter-rater reliability per ChatGPT version. The inter-rater reliability between ten different iterations was accessed for each ChatGPT version. (A) Heatmap of the Cohen’s Kappa correlations for interrater reliability between each ChatGPT v 3.5 iteration. (B) Heatmap of the Cohen’s Kappa correlations for interrater reliability between each ChatGPT v 4 iteration. (C) Heatmap of the Cohen’s Kappa correlations for interrater reliability between each ChatGPT v 4o iteration. The color scale ranges from blue to red, where blue indicates lower agreement and red indicates higher agreement. A Cohen’s Kappa value closer to 1 indicates high agreement, while a value close to 0 reflects weak agreement.
FIGURE 4
FIGURE 4
Overall confidence of the rater in its evaluation of reports for hepatic steatosis.

References

    1. Mantovani A, Scorletti E, Mosca A, Alisi A, Byrne CD, Targher G. Complications, morbidity and mortality of nonalcoholic fatty liver disease. Metabolism. 2020;111S:154170. - PubMed
    1. Kalligeros M, Vassilopoulos A, Vassilopoulos S, Victor DW, Mylonakis E, Noureddin M. Prevalence of steatotic liver disease (MASLD, MetALD, and ALD) in the United States: NHANES 2017-2020. Clin Gastroenterol Hepatol. 2024;22:1330–1332.e4. - PubMed
    1. Stern C, Castera L. Non-invasive diagnosis of hepatic steatosis. Hepatol Int. 2017;11:70–78. - PubMed
    1. Likhitsup A, Dundulis J, Ansari S, El-Halawany H, Michelson R, Hutton C, et al. . Prevalence of non-alcoholic fatty liver disease on computed tomography in patients with inflammatory bowel disease visiting an emergency department. Ann Gastroenterol Hepatol (Paris). 2019;32:283–286. - PMC - PubMed
    1. Schneider CV, Li T, Zhang D, Mezina AI, Rattan P, Huang H, et al. . Large-scale identification of undiagnosed hepatic steatosis using natural language processing. EClinicalMedicine. 2023;62:102149. - PMC - PubMed