Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;35(8):4506-4517.
doi: 10.1007/s00330-025-11416-4. Epub 2025 Feb 13.

RADEX: a rule-based clinical and radiology data extraction tool demonstrated on thyroid ultrasound reports

Affiliations

RADEX: a rule-based clinical and radiology data extraction tool demonstrated on thyroid ultrasound reports

Lewis Howell et al. Eur Radiol. 2025 Aug.

Abstract

Objectives: Radiology reports contain valuable information for research and audits, but relevant details are often buried within free-text fields. This makes them challenging and time-consuming to extract for secondary analyses, including training artificial intelligence (AI) models.

Materials and methods: This study presents a rule-based RAdiology Data EXtraction tool (RADEX) to enable biomedical researchers and healthcare professionals to automate information extraction from clinical documents. RADEX simplifies the translation of domain expertise into regular-expression models, enabling context-dependent searching without specialist expertise in Natural Language Processing. Its utility was demonstrated in the multi-label classification of fourteen clinical features in a large retrospective dataset (n = 16,246) of thyroid ultrasound reports from five hospitals in the United Kingdom (UK). A tuning subset (n = 200) was used to iteratively develop the search strategy, and a holdout test subset (n = 202) was used to evaluate the performance against reference-standard labels.

Results: The dataset cardinality was 3.06, and the label density was 0.34. Cohen's Kappa was 0.94 for rater 1 and 0.95 for rater 2. For RADEX, micro-average sensitivity, specificity, and F1-score were 0.97, 0.96, and 0.94, respectively. The processing time was 12.3 milliseconds per report, enabling fast and reliable information extraction.

Conclusion: RADEX is a versatile tool for bespoke research and audit applications, where access to labelled data or computing infrastructure is limited, or explainability and reproducibility are priorities. This offers a time-saving and freely available option to accelerate structured data collection, enabling new insights and improved patient care.

Key points: Question Radiology reports contain vital information that is buried in unstructured free-text fields. Can we extract this information effectively for research and audit applications? Findings A rule-based RAdiology Data Extraction tool (RADEX) is described and used to classify fourteen key findings from thyroid ultrasound reports with sensitivity and specificity > 0.95. Clinical relevance RADEX offers clinicians and researchers a time-saving tool to accelerate structured data collection. This practical approach prioritises transparency, repeatability, and usability, enabling new insights into improved patient care.

Keywords: Data annotation; Information extraction; Natural Language Processing; Thyroid; Ultrasound.

PubMed Disclaimer

Conflict of interest statement

Compliance with ethical standards. Guarantor: The scientific guarantor of this publication is James McLaughlan. Conflict of interest: The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article. Statistics and biometry: No complex statistical methods were necessary for this paper. Informed consent: Written informed consent was waived by the Institutional Review Board. Ethical approval: Institutional Review Board approval was not required because formal ethical approval was waived for this retrospective study of collated clinical and imaging reports in accordance with the Institutional Health Research Authority Framework. Study subjects or cohorts overlap: Not applicable. Methodology: Retrospective Observational Multicenter study

Figures

Fig. 1
Fig. 1
Typical workflow for the Radiology Data Extraction Tool (RADEX), including iterative development of searches
Fig. 2
Fig. 2
A synthetic neck and thyroid ultrasound report, highlighted using a rule-based radiology data extraction tool. This example was correctly classified as positive for goitre (blue), altered (heterogeneous) thyroid echotexture (green), multiple thyroid nodules (purple), and lymph node examination (brown). The phrase ‘no abnormal’ was matched as a possible negation modifier to the term ‘lymph nodes’
Fig. 3
Fig. 3
F1-score for rule-based predictions on the holdout test clinical dataset, before and after refining the searches
Fig. 4
Fig. 4
Class-wise confusion matrices for multi-label data. True positive (TP), True negative (TN), False positive (FP), and False Negative (FN) counts and normalised scores for each class

Similar articles

References

    1. Rubin DL, Kahn CE (2017) Common data elements in radiology. Radiology 283:837–844. 10.1148/radiol.2016161553 - PubMed
    1. Wang Y, Wang L, Rastegar-Mojarad M et al (2018) Clinical information extraction applications: a literature review. J Biomed Inform 77:34–49. 10.1016/j.jbi.2017.11.011 - PMC - PubMed
    1. Chen MC, Ball RL, Yang L et al (2018) Deep learning to classify radiology free-text reports. Radiology 286:845–852. 10.1148/radiol.2017171115 - PubMed
    1. Linna N, Kahn CE (2022) Applications of Natural Language Processing in radiology: a systematic review. Int J Med Inform 163:104779. 10.1016/j.ijmedinf.2022.104779 - PubMed
    1. Pons E, Braun LMM, Hunink MGM, Kors JA (2016) Natural Language Processing in radiology: a systematic review. Radiology 279:329–343. 10.1148/radiol.16142770 - PubMed

LinkOut - more resources