Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 9:13:e67706.
doi: 10.2196/67706.

Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study

Affiliations

Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study

Mohammed Mahyoub et al. JMIR Med Inform. .

Abstract

Background: Pulmonary embolism (PE) is a critical condition requiring rapid diagnosis to reduce mortality. Extracting PE diagnoses from radiology reports manually is time-consuming, highlighting the need for automated solutions. Advances in natural language processing, especially transformer models like GPT-4o, offer promising tools to improve diagnostic accuracy and workflow efficiency in clinical settings.

Objective: This study aimed to develop an automatic extraction system using GPT-4o to extract PE diagnoses from radiology report impressions, enhancing clinical decision-making and workflow efficiency.

Methods: In total, 2 approaches were developed and evaluated: a fine-tuned Clinical Longformer as a baseline model and a GPT-4o-based extractor. Clinical Longformer, an encoder-only model, was chosen for its robustness in text classification tasks, particularly on smaller scales. GPT-4o, a decoder-only instruction-following LLM, was selected for its advanced language understanding capabilities. The study aimed to evaluate GPT-4o's ability to perform text classification compared to the baseline Clinical Longformer. The Clinical Longformer was trained on a dataset of 1000 radiology report impressions and validated on a separate set of 200 samples, while the GPT-4o extractor was validated using the same 200-sample set. Postdeployment performance was further assessed on an additional 200 operational records to evaluate model efficacy in a real-world setting.

Results: GPT-4o outperformed the Clinical Longformer in 2 of the metrics, achieving a sensitivity of 1.0 (95% CI 1.0-1.0; Wilcoxon test, P<.001) and an F1-score of 0.975 (95% CI 0.9495-0.9947; Wilcoxon test, P<.001) across the validation dataset. Postdeployment evaluations also showed strong performance of the deployed GPT-4o model with a sensitivity of 1.0 (95% CI 1.0-1.0), a specificity of 0.94 (95% CI 0.8913-0.9804), and an F1-score of 0.97 (95% CI 0.9479-0.9908). This high level of accuracy supports a reduction in manual review, streamlining clinical workflows and improving diagnostic precision.

Conclusions: The GPT-4o model provides an effective solution for the automatic extraction of PE diagnoses from radiology reports, offering a reliable tool that aids timely and accurate clinical decision-making. This approach has the potential to significantly improve patient outcomes by expediting diagnosis and treatment pathways for critical conditions like PE.

Keywords: Clinical Longformer; GPT-4o; LLMs; large language models; natural language processing; pulmonary embolism; radiology reports; text classification.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Radiology impressions text classification. PE: pulmonary embolism.
Figure 2
Figure 2
Extraction of pulmonary embolism diagnosis prompt template.
Figure 3
Figure 3
Deployment pipeline.
Figure 4
Figure 4
Web App. Only positive cases are displayed. This Web App has been operationalized at Virtua Health, New Jersey.
Figure 5
Figure 5
Evaluation metrics comparison of Clinical Longformer (baseline model) and GPT-4o.

Similar articles

Cited by

References

    1. Tanra AH, AT L, T E, DE R. Diagnostic value of platelet indices in patients with pulmonary embolism. Indonesian J. Clin. Pathol. Med. Lab. 2020;27(1):22–26. doi: 10.24293/ijcpml.v27i1.1625. - DOI
    1. Deng W, Gao W. Cathepsin Causal Association with Pulmonary Embolism: A Mendelian Randomization Analysis. 2024. [2024-08-14]. https://www.researchsquare.com/article/rs-4191858/latest .
    1. Lyhne MD, Kline JA, Nielsen-Kudsk JE, Andersen A. Pulmonary vasodilation in acute pulmonary embolism - a systematic review. Pulm Circ. 2020;10(1):2045894019899775. doi: 10.1177/2045894019899775. https://journals.sagepub.com/doi/abs/10.1177/2045894019899775?url_ver=Z3... 10.1177_2045894019899775 - DOI - DOI - PMC - PubMed
    1. Zhang SL, Zhang QF, Li G, Guo M, Qi X, Xing XH, Wang Z. Case Report: resuscitation of patient with tumor-induced acute pulmonary embolism by venoarterial extracorporeal membrane oxygenation. Front Cardiovasc Med. 2024;11:1322387. doi: 10.3389/fcvm.2024.1322387. https://europepmc.org/abstract/MED/38426120 - DOI - PMC - PubMed
    1. Grusova G, Lambert L, Zeman J, Lambertova A, Benes J. The additional value of esophageal wall evaluation and secondary findings in emergency patients undergoing CT pulmonary angiography. Iran J Radiol Brieflands. 2018;15(1):e63466. doi: 10.5812/iranjradiol.63466. https://brieflands.com/articles/iranjradiol-63466.html - DOI

MeSH terms