Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov:47:103935.
doi: 10.1016/j.clon.2025.103935. Epub 2025 Sep 11.

RT-HaND_C: A Multi-Source, Validated Real-world Head and Neck Cancer Dataset for Research

Affiliations
Free article

RT-HaND_C: A Multi-Source, Validated Real-world Head and Neck Cancer Dataset for Research

T Young et al. Clin Oncol (R Coll Radiol). 2025 Nov.
Free article

Abstract

Aims: Real-world data (RWD) are a valuable resource for head and neck cancer (HNC) research, offering insights into outcomes among diverse, comorbid patients who are often underrepresented in clinical trials. However, RWD pose challenges, including data quality and requires rigorous evaluation before being used to generate real-world evidence. We aimed to develop a large HNC oncology dataset containing comprehensive clinical data.

Methods: We developed RT-HaND_C, a multi-source clinical dataset integrating structured Electronic Health Record (EHR) data, unstructured EHR data extracted using a previously validated AI-driven Natural Language Processing tool, and manually curated datasets. RT-HaND_C incorporates extensive demographic, disease, laboratory, treatment, outcome (disease and toxicity) and radiotherapy dosimetry data for all HNC oncology patients seen at our centre (2010-2023). The dataset underwent rigorous evaluation for accuracy, completeness and consistency. We evaluated usability by addressing the unanswered question of long-term weight trends post-radical HNC radiotherapy.

Results: The retrospective cohort comprises 2,895 HNC patients with over 1.9 million data points across over 2000 data categories. Accuracy assessments exceeded 98% for most variables. High data completeness and consistency were observed for all key data categories. Dataset usability testing showed rapidly extractable and analysable data, with data demonstrating that HNC patients experienced statistically significant weight loss persisting at 5 years post-radical radiotherapy (even when accounting for disease recurrence), with peak weight loss observed at 6 months post-radiotherapy.

Conclusions: RT-HaND_C represents a novel, high-quality RWD resource and evaluation framework. RT-HaND_C is virtually linked to corresponding diagnostic and radiotherapy imaging data to facilitate multi-modal research. The dataset is available for research and collaboration, with ongoing work focused on enhancing completeness and incorporating prospective updates.

Keywords: Artificial intelligence; Data mining; Natural language processing; Real-world data.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare no conflict of interest.

LinkOut - more resources