Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 5.
doi: 10.1007/s43441-025-00820-z. Online ahead of print.

Leveraging Synthetic Data to Facilitate Research: A Collaborative Model for Analyzing Sensitive National Cancer Registry Data in England

Affiliations

Leveraging Synthetic Data to Facilitate Research: A Collaborative Model for Analyzing Sensitive National Cancer Registry Data in England

George Kafatos et al. Ther Innov Regul Sci. .

Abstract

Real-world data (RWD) are increasingly recognized as critical to advancing drug development and health care delivery, with regulatory bodies increasingly recognising their value. However, stringent governance requirements, while essential for protecting patient privacy, create significant challenges for conducting research. The Cancer Analysis System (CAS), managed by National Health Service (NHS) England, includes a national cancer registry and linked health care datasets. To address data access challenges, Simulacrum, a set of publicly available synthetic datasets generated from the CAS, can be used to carry out preliminary data analysis, hypothesis generation and development of programming code that can be executed to run analyses on CAS data. This paper presents a collaborative operating model that leverages Simulacrum to enable efficient, privacy-compliant analytics. Analysis of 18 projects conducted using this model demonstrated an average duration of 2.3 months from the start of Code Development to Data Release (CDDR). By enabling researchers to conduct privacy-compliant analysis on synthetic data, this approach increases transparency by providing insights into patient-level data while reduces reliance on custodians of sensitive data. Our findings highlight how synthetic data can be leveraged to facilitate efficient research on restricted patient-level RWD, while safeguarding patient privacy. This framework offers a scalable solution for other data custodians that can enable broader use of RWD, accelerating healthcare innovation.

Keywords: Cancer analysis system; Cancer research; Real-world data; Simulacrum; Synthetic data.

PubMed Disclaimer

Conflict of interest statement

Declarations. Target Journal: Therapeutic Innovation & Regulatory Science. Type of Publication: Analytical Report. Competing Interests: GK and OA are Amgen Ltd employees and own Amgen Inc shares. JL and PH are employees of IQVIA. SJ, SV and LF have no conflicts to declare.

Similar articles

References

    1. Singh G, Schulthess D, Hughes N, Vannieuwenhuyse B, Kalra D. Real world big data for clinical research and drug development. Drug Discov Today. 2018;23(3):652–60. - DOI - PubMed
    1. Pastorino R, De Vito C, Migliara G, Glocker K, Binenbaum I, Ricciardi W, Boccia S. Benefits and challenges of big data in healthcare: an overview of the European initiatives. Eur J Public Health. 2019;29(Supplement3):23–7. - DOI - PubMed - PMC
    1. NICE. Real-world evidence framework 2022 [Access date: 10/5/2025]. Available from: https://www.nice.org.uk/corporate/ecd9/chapter/overview
    1. Liu F, Panagiotakos D. Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol. 2022;22(1):287. - DOI - PubMed - PMC
    1. Rudrapatna VA, Butte AJ. Opportunities and challenges in using real-world data for health care. J Clin Invest. 2020;130(2):565–74. - DOI - PubMed - PMC

LinkOut - more resources