Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 2;6(10):e2336100.
doi: 10.1001/jamanetworkopen.2023.36100.

Generative Artificial Intelligence for Chest Radiograph Interpretation in the Emergency Department

Affiliations

Generative Artificial Intelligence for Chest Radiograph Interpretation in the Emergency Department

Jonathan Huang et al. JAMA Netw Open. .

Abstract

Importance: Multimodal generative artificial intelligence (AI) methodologies have the potential to optimize emergency department care by producing draft radiology reports from input images.

Objective: To evaluate the accuracy and quality of AI-generated chest radiograph interpretations in the emergency department setting.

Design, setting, and participants: This was a retrospective diagnostic study of 500 randomly sampled emergency department encounters at a tertiary care institution including chest radiographs interpreted by both a teleradiology service and on-site attending radiologist from January 2022 to January 2023. An AI interpretation was generated for each radiograph. The 3 radiograph interpretations were each rated in duplicate by 6 emergency department physicians using a 5-point Likert scale.

Main outcomes and measures: The primary outcome was any difference in Likert scores between radiologist, AI, and teleradiology reports, using a cumulative link mixed model. Secondary analyses compared the probability of each report type containing no clinically significant discrepancy with further stratification by finding presence, using a logistic mixed-effects model. Physician comments on discrepancies were recorded.

Results: A total of 500 ED studies were included from 500 unique patients with a mean (SD) age of 53.3 (21.6) years; 282 patients (56.4%) were female. There was a significant association of report type with ratings, with post hoc tests revealing significantly greater scores for AI (mean [SE] score, 3.22 [0.34]; P < .001) and radiologist (mean [SE] score, 3.34 [0.34]; P < .001) reports compared with teleradiology (mean [SE] score, 2.74 [0.34]) reports. AI and radiologist reports were not significantly different. On secondary analysis, there was no difference in the probability of no clinically significant discrepancy between the 3 report types. Further stratification of reports by presence of cardiomegaly, pulmonary edema, pleural effusion, infiltrate, pneumothorax, and support devices also yielded no difference in the probability of containing no clinically significant discrepancy between the report types.

Conclusions and relevance: In a representative sample of emergency department chest radiographs, results suggest that the generative AI model produced reports of similar clinical accuracy and textual quality to radiologist reports while providing higher textual quality than teleradiologist reports. Implementation of the model in the clinical workflow could enable timely alerts to life-threatening pathology while aiding imaging interpretation and documentation.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Drs Neill and Etemadi and Mr Heller reported having equity ownership in Cardiosense Inc during the conduct of the study. Dr Etemadi reported having a patent for this work pending during the conduct of the study that was applied for and licensed to Northwestern Medicine. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Artificial Intelligence (AI) Model Architecture
The AI model is an encoder-decoder model trained to generate a text report given a chest radiograph (CXR) and most recent comparison (anterior-posterior or posterior-anterior view only). The vision encoder weights were initialized from Vision Transformer (ViT) base and the text decoder weights were initialized from Robustly Optimized BERT Pretraining Approach (RoBERTa) base before training for 30 epochs on a data set of 900 000 CXRs. cp indicates chest pain; sob, shortness of breath.
Figure 2.
Figure 2.. Artificial Intelligence (AI) Evaluation Study Design
A total of 500 emergency department (ED) encounters with associated overnight chest radiographs interpreted by a teleradiology service, then overread by an in-house radiologist, were randomly selected. The teleradiology reports as well as the finalized in-house radiologist reports were retrospectively identified, and an AI report was generated as well. Six ED physicians served as raters; each report was rated for accuracy and quality by 2 physicians blinded to the report type using a 5-point Likert scale such that each physician rated each chest radiograph once. The primary and secondary analyses were also performed as shown.
Figure 3.
Figure 3.. Overall Rating Distribution
The distribution of Likert scale ratings for radiologist, artificial intelligence (AI), and teleradiology reports is shown. Each report was rated in duplicate, resulting in 1000 ratings of 500 radiographs for each of the 3 report types.
Figure 4.
Figure 4.. Probability of Non–Clinically Discrepant Report
The probability of producing a non–clinically discrepant report (ie, Likert score ≥3) for studies with and without an abnormality across each report type. Error bars designate the upper and lower confidence limits of the probability estimate. AI indicates artificial intelligence.
Figure 5.
Figure 5.. Probability of Non–Clinically Discrepant Report Across Pathologies
The probability of producing a non–clinically discrepant report (ie, Likert score ≥3) for each read type across subsets of studies with a given abnormality. Error bars designate the upper and lower confidence limits of the probability estimate. The number below each label indicates the study count for that subset. AI indicates artificial intelligence.

Comment in

References

    1. Petinaux B, Bhat R, Boniface K, Aristizabal J. Accuracy of radiographic readings in the emergency department. Am J Emerg Med. 2011;29(1):18-25. doi:10.1016/j.ajem.2009.07.011 - DOI - PubMed
    1. Tranovich MJ, Gooch CM, Dougherty JM. Radiograph interpretation discrepancies in a community hospital emergency department. West J Emerg Med. 2019;20(4):626-632. doi:10.5811/westjem.2019.1.41375 - DOI - PMC - PubMed
    1. Hardy M, Snaith B, Scally A. The impact of immediate reporting on interpretive discrepancies and patient referral pathways within the emergency department: a randomised controlled trial. Br J Radiol. 2013;86(1021):20120112. doi:10.1259/bjr.20120112 - DOI - PMC - PubMed
    1. Selvarajan SK, Levin DC, Parker L. The increasing use of emergency department imaging in the US: is it appropriate? AJR Am J Roentgenol. 2019;213(4):W180-W184. doi:10.2214/AJR.19.21386 - DOI - PubMed
    1. Scheinfeld MH, Dym RJ. Update on establishing and managing an overnight emergency radiology division. Emerg Radiol. 2021;28(5):993-1001. doi:10.1007/s10140-021-01935-0 - DOI - PMC - PubMed