A Large Language Model to Detect Negated Expressions in Radiology Reports
- PMID: 39322813
- PMCID: PMC12092861
- DOI: 10.1007/s10278-024-01274-9
A Large Language Model to Detect Negated Expressions in Radiology Reports
Abstract
Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.
Keywords: Large language models; Named entity recognition; Natural language processing; Negated expression (negex) detection; Radiology reports.
© 2024. The Author(s).
Conflict of interest statement
Declarations. Ethics Approval: Study protocol approved by the University of Pennsylvania IRB. Informed consent from patients was waived. Competing Interests: The authors declare no competing interests.
Figures


Similar articles
-
Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).BMC Med Inform Decis Mak. 2022 Jul 30;22(1):200. doi: 10.1186/s12911-022-01946-y. BMC Med Inform Decis Mak. 2022. PMID: 35907966 Free PMC article.
-
Deep Learning Approach for Negation and Speculation Detection for Automated Important Finding Flagging and Extraction in Radiology Report: Internal Validation and Technique Comparison Study.JMIR Med Inform. 2023 Apr 25;11:e46348. doi: 10.2196/46348. JMIR Med Inform. 2023. PMID: 37097731 Free PMC article.
-
Development and External Validation of an Artificial Intelligence Model for Identifying Radiology Reports Containing Recommendations for Additional Imaging.AJR Am J Roentgenol. 2023 Sep;221(3):377-385. doi: 10.2214/AJR.23.29120. Epub 2023 Apr 19. AJR Am J Roentgenol. 2023. PMID: 37073901
-
Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing.J Digit Imaging. 2010 Apr;23(2):119-32. doi: 10.1007/s10278-009-9215-7. Epub 2009 May 30. J Digit Imaging. 2010. PMID: 19484309 Free PMC article. Review.
-
The reporting quality of natural language processing studies: systematic review of studies of radiology reports.BMC Med Imaging. 2021 Oct 2;21(1):142. doi: 10.1186/s12880-021-00671-8. BMC Med Imaging. 2021. PMID: 34600486 Free PMC article.
References
-
- Linna N, Kahn CE Jr.: Applications of natural language processing in radiology: A systematic review. Int J Med Inform 163:104779, 2022 - PubMed
-
- Hripcsak G, Austin JH, Alderson PO, Friedman C: Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology 224:157-163, 2002 - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources