Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study
- PMID: 40203306
- PMCID: PMC12018862
- DOI: 10.2196/67706
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study
Abstract
Background: Pulmonary embolism (PE) is a critical condition requiring rapid diagnosis to reduce mortality. Extracting PE diagnoses from radiology reports manually is time-consuming, highlighting the need for automated solutions. Advances in natural language processing, especially transformer models like GPT-4o, offer promising tools to improve diagnostic accuracy and workflow efficiency in clinical settings.
Objective: This study aimed to develop an automatic extraction system using GPT-4o to extract PE diagnoses from radiology report impressions, enhancing clinical decision-making and workflow efficiency.
Methods: In total, 2 approaches were developed and evaluated: a fine-tuned Clinical Longformer as a baseline model and a GPT-4o-based extractor. Clinical Longformer, an encoder-only model, was chosen for its robustness in text classification tasks, particularly on smaller scales. GPT-4o, a decoder-only instruction-following LLM, was selected for its advanced language understanding capabilities. The study aimed to evaluate GPT-4o's ability to perform text classification compared to the baseline Clinical Longformer. The Clinical Longformer was trained on a dataset of 1000 radiology report impressions and validated on a separate set of 200 samples, while the GPT-4o extractor was validated using the same 200-sample set. Postdeployment performance was further assessed on an additional 200 operational records to evaluate model efficacy in a real-world setting.
Results: GPT-4o outperformed the Clinical Longformer in 2 of the metrics, achieving a sensitivity of 1.0 (95% CI 1.0-1.0; Wilcoxon test, P<.001) and an F1-score of 0.975 (95% CI 0.9495-0.9947; Wilcoxon test, P<.001) across the validation dataset. Postdeployment evaluations also showed strong performance of the deployed GPT-4o model with a sensitivity of 1.0 (95% CI 1.0-1.0), a specificity of 0.94 (95% CI 0.8913-0.9804), and an F1-score of 0.97 (95% CI 0.9479-0.9908). This high level of accuracy supports a reduction in manual review, streamlining clinical workflows and improving diagnostic precision.
Conclusions: The GPT-4o model provides an effective solution for the automatic extraction of PE diagnoses from radiology reports, offering a reliable tool that aids timely and accurate clinical decision-making. This approach has the potential to significantly improve patient outcomes by expediting diagnosis and treatment pathways for critical conditions like PE.
Keywords: Clinical Longformer; GPT-4o; LLMs; large language models; natural language processing; pulmonary embolism; radiology reports; text classification.
©Mohammed Mahyoub, Kacie Dougherty, Ajit Shukla. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 09.04.2025.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures





Similar articles
-
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895. Radiology. 2025. PMID: 39807977
-
Patient Triage and Guidance in Emergency Departments Using Large Language Models: Multimetric Study.J Med Internet Res. 2025 May 15;27:e71613. doi: 10.2196/71613. J Med Internet Res. 2025. PMID: 40374171 Free PMC article.
-
Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework.JMIR Med Inform. 2025 Mar 28;13:e68618. doi: 10.2196/68618. JMIR Med Inform. 2025. PMID: 40153539 Free PMC article.
-
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984. JMIR Cancer. 2025. PMID: 40153782 Free PMC article.
-
[Integration of large language models into the clinic : Revolution in analysing and processing patient data to increase efficiency and quality in radiology].Radiologie (Heidelb). 2025 Apr;65(4):243-248. doi: 10.1007/s00117-025-01431-3. Epub 2025 Mar 12. Radiologie (Heidelb). 2025. PMID: 40072530 Review. German.
Cited by
-
Assessing the Accuracy of Diagnostic Capabilities of Large Language Models.Diagnostics (Basel). 2025 Jun 29;15(13):1657. doi: 10.3390/diagnostics15131657. Diagnostics (Basel). 2025. PMID: 40647657 Free PMC article.
-
Evaluating prompt and data perturbation sensitivity in large language models for radiology reports classification.JAMIA Open. 2025 Aug 12;8(4):ooaf073. doi: 10.1093/jamiaopen/ooaf073. eCollection 2025 Aug. JAMIA Open. 2025. PMID: 40799928 Free PMC article.
References
-
- Tanra AH, AT L, T E, DE R. Diagnostic value of platelet indices in patients with pulmonary embolism. Indonesian J. Clin. Pathol. Med. Lab. 2020;27(1):22–26. doi: 10.24293/ijcpml.v27i1.1625. - DOI
-
- Deng W, Gao W. Cathepsin Causal Association with Pulmonary Embolism: A Mendelian Randomization Analysis. 2024. [2024-08-14]. https://www.researchsquare.com/article/rs-4191858/latest .
-
- Lyhne MD, Kline JA, Nielsen-Kudsk JE, Andersen A. Pulmonary vasodilation in acute pulmonary embolism - a systematic review. Pulm Circ. 2020;10(1):2045894019899775. doi: 10.1177/2045894019899775. https://journals.sagepub.com/doi/abs/10.1177/2045894019899775?url_ver=Z3... 10.1177_2045894019899775 - DOI - DOI - PMC - PubMed
-
- Zhang SL, Zhang QF, Li G, Guo M, Qi X, Xing XH, Wang Z. Case Report: resuscitation of patient with tumor-induced acute pulmonary embolism by venoarterial extracorporeal membrane oxygenation. Front Cardiovasc Med. 2024;11:1322387. doi: 10.3389/fcvm.2024.1322387. https://europepmc.org/abstract/MED/38426120 - DOI - PMC - PubMed
-
- Grusova G, Lambert L, Zeman J, Lambertova A, Benes J. The additional value of esophageal wall evaluation and secondary findings in emergency patients undergoing CT pulmonary angiography. Iran J Radiol Brieflands. 2018;15(1):e63466. doi: 10.5812/iranjradiol.63466. https://brieflands.com/articles/iranjradiol-63466.html - DOI
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Research Materials