RADEX: a rule-based clinical and radiology data extraction tool demonstrated on thyroid ultrasound reports
- PMID: 39945809
- PMCID: PMC12226629
- DOI: 10.1007/s00330-025-11416-4
RADEX: a rule-based clinical and radiology data extraction tool demonstrated on thyroid ultrasound reports
Abstract
Objectives: Radiology reports contain valuable information for research and audits, but relevant details are often buried within free-text fields. This makes them challenging and time-consuming to extract for secondary analyses, including training artificial intelligence (AI) models.
Materials and methods: This study presents a rule-based RAdiology Data EXtraction tool (RADEX) to enable biomedical researchers and healthcare professionals to automate information extraction from clinical documents. RADEX simplifies the translation of domain expertise into regular-expression models, enabling context-dependent searching without specialist expertise in Natural Language Processing. Its utility was demonstrated in the multi-label classification of fourteen clinical features in a large retrospective dataset (n = 16,246) of thyroid ultrasound reports from five hospitals in the United Kingdom (UK). A tuning subset (n = 200) was used to iteratively develop the search strategy, and a holdout test subset (n = 202) was used to evaluate the performance against reference-standard labels.
Results: The dataset cardinality was 3.06, and the label density was 0.34. Cohen's Kappa was 0.94 for rater 1 and 0.95 for rater 2. For RADEX, micro-average sensitivity, specificity, and F1-score were 0.97, 0.96, and 0.94, respectively. The processing time was 12.3 milliseconds per report, enabling fast and reliable information extraction.
Conclusion: RADEX is a versatile tool for bespoke research and audit applications, where access to labelled data or computing infrastructure is limited, or explainability and reproducibility are priorities. This offers a time-saving and freely available option to accelerate structured data collection, enabling new insights and improved patient care.
Key points: Question Radiology reports contain vital information that is buried in unstructured free-text fields. Can we extract this information effectively for research and audit applications? Findings A rule-based RAdiology Data Extraction tool (RADEX) is described and used to classify fourteen key findings from thyroid ultrasound reports with sensitivity and specificity > 0.95. Clinical relevance RADEX offers clinicians and researchers a time-saving tool to accelerate structured data collection. This practical approach prioritises transparency, repeatability, and usability, enabling new insights into improved patient care.
Keywords: Data annotation; Information extraction; Natural Language Processing; Thyroid; Ultrasound.
© 2025. The Author(s).
Conflict of interest statement
Compliance with ethical standards. Guarantor: The scientific guarantor of this publication is James McLaughlan. Conflict of interest: The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article. Statistics and biometry: No complex statistical methods were necessary for this paper. Informed consent: Written informed consent was waived by the Institutional Review Board. Ethical approval: Institutional Review Board approval was not required because formal ethical approval was waived for this retrospective study of collated clinical and imaging reports in accordance with the Institutional Health Research Authority Framework. Study subjects or cohorts overlap: Not applicable. Methodology: Retrospective Observational Multicenter study
Figures




Similar articles
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843. JBI Database System Rev Implement Rep. 2016. PMID: 27532314
-
Intraoperative frozen section analysis for the diagnosis of early stage ovarian cancer in suspicious pelvic masses.Cochrane Database Syst Rev. 2016 Mar 1;3(3):CD010360. doi: 10.1002/14651858.CD010360.pub2. Cochrane Database Syst Rev. 2016. PMID: 26930463 Free PMC article.
-
Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2. Cochrane Database Syst Rev. 2018. PMID: 29357120 Free PMC article.
-
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.Health Technol Assess. 2001;5(28):1-110. doi: 10.3310/hta5280. Health Technol Assess. 2001. PMID: 11701100
References
-
- Rubin DL, Kahn CE (2017) Common data elements in radiology. Radiology 283:837–844. 10.1148/radiol.2016161553 - PubMed
-
- Chen MC, Ball RL, Yang L et al (2018) Deep learning to classify radiology free-text reports. Radiology 286:845–852. 10.1148/radiol.2017171115 - PubMed
-
- Linna N, Kahn CE (2022) Applications of Natural Language Processing in radiology: a systematic review. Int J Med Inform 163:104779. 10.1016/j.ijmedinf.2022.104779 - PubMed
-
- Pons E, Braun LMM, Hunink MGM, Kors JA (2016) Natural Language Processing in radiology: a systematic review. Radiology 279:329–343. 10.1148/radiol.16142770 - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources