Leveraging Synthetic Data to Facilitate Research: A Collaborative Model for Analyzing Sensitive National Cancer Registry Data in England
- PMID: 40474047
- DOI: 10.1007/s43441-025-00820-z
Leveraging Synthetic Data to Facilitate Research: A Collaborative Model for Analyzing Sensitive National Cancer Registry Data in England
Abstract
Real-world data (RWD) are increasingly recognized as critical to advancing drug development and health care delivery, with regulatory bodies increasingly recognising their value. However, stringent governance requirements, while essential for protecting patient privacy, create significant challenges for conducting research. The Cancer Analysis System (CAS), managed by National Health Service (NHS) England, includes a national cancer registry and linked health care datasets. To address data access challenges, Simulacrum, a set of publicly available synthetic datasets generated from the CAS, can be used to carry out preliminary data analysis, hypothesis generation and development of programming code that can be executed to run analyses on CAS data. This paper presents a collaborative operating model that leverages Simulacrum to enable efficient, privacy-compliant analytics. Analysis of 18 projects conducted using this model demonstrated an average duration of 2.3 months from the start of Code Development to Data Release (CDDR). By enabling researchers to conduct privacy-compliant analysis on synthetic data, this approach increases transparency by providing insights into patient-level data while reduces reliance on custodians of sensitive data. Our findings highlight how synthetic data can be leveraged to facilitate efficient research on restricted patient-level RWD, while safeguarding patient privacy. This framework offers a scalable solution for other data custodians that can enable broader use of RWD, accelerating healthcare innovation.
Keywords: Cancer analysis system; Cancer research; Real-world data; Simulacrum; Synthetic data.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Target Journal: Therapeutic Innovation & Regulatory Science. Type of Publication: Analytical Report. Competing Interests: GK and OA are Amgen Ltd employees and own Amgen Inc shares. JL and PH are employees of IQVIA. SJ, SV and LF have no conflicts to declare.
Similar articles
-
Decades in the Making: The Evolution of Digital Health Research Infrastructure Through Synthetic Data, Common Data Models, and Federated Learning.J Med Internet Res. 2024 Dec 20;26:e58637. doi: 10.2196/58637. J Med Internet Res. 2024. PMID: 39705072 Free PMC article.
-
The future of Cochrane Neonatal.Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
-
Preliminary Attainability Assessment of Real-World Data for Answering Major Clinical Research Questions in Breast Cancer Brain Metastasis: Framework Development and Validation Study.J Med Internet Res. 2023 Mar 23;25:e43359. doi: 10.2196/43359. J Med Internet Res. 2023. PMID: 36951923 Free PMC article.
-
A comprehensive review of methodologies and application to use the real-world data and analytics platform TriNetX.Front Pharmacol. 2025 Mar 10;16:1516126. doi: 10.3389/fphar.2025.1516126. eCollection 2025. Front Pharmacol. 2025. PMID: 40129946 Free PMC article. Review.
-
Generating and using real-world data: A worthwhile uphill battle.Cell. 2024 Mar 28;187(7):1636-1650. doi: 10.1016/j.cell.2024.02.012. Cell. 2024. PMID: 38552611 Review.
References
-
- NICE. Real-world evidence framework 2022 [Access date: 10/5/2025]. Available from: https://www.nice.org.uk/corporate/ecd9/chapter/overview
LinkOut - more resources
Full Text Sources