Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep:7:e2300116.
doi: 10.1200/CCI.23.00116.

Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets

Affiliations

Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets

Samer El Kababji et al. JCO Clin Cancer Inform. 2023 Sep.

Abstract

Purpose: There is strong interest from patients, researchers, the pharmaceutical industry, medical journal editors, funders of research, and regulators in sharing clinical trial data for secondary analysis. However, data access remains a challenge because of concerns about patient privacy. It has been argued that synthetic data generation (SDG) is an effective way to address these privacy concerns. There is a dearth of evidence supporting this on oncology clinical trial data sets, and on the utility of privacy-preserving synthetic data. The objective of the proposed study is to validate the utility and privacy risks of synthetic clinical trial data sets across multiple SDG techniques.

Methods: We synthesized data sets from eight breast cancer clinical trial data sets using three types of generative models: sequential synthesis, conditional generative adversarial network, and variational autoencoder. Synthetic data utility was evaluated by replicating the published analyses on the synthetic data and assessing concordance of effect estimates and CIs between real and synthetic data. Privacy was evaluated by measuring attribution disclosure risk and membership disclosure risk.

Results: Utility was highest using the sequential synthesis method where all results were replicable and the CI overlap most similar or higher for seven of eight data sets. Both types of privacy risks were low across all three types of generative models.

Discussion: Synthetic data using sequential synthesis methods can act as a proxy for real clinical trial data sets, and simultaneously have low privacy risks. This type of generative model can be one way to enable broader sharing of clinical trial data.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Greg Pond

Employment: Roche Canada

Stock and Other Ownership Interests: Roche Canada

Honoraria: AstraZeneca

Consulting or Advisory Role: Takeda, Profound Medical

Lucy Mosquera

Employment: Aetion

Stock and Other Ownership Interests: Aetion

Patents, Royalties, Other Intellectual Property: Pending patents through work at Replica Analytics, an Aetion company

Alexander Paterson

Stock and Other Ownership Interests: Roche, Pfizer

William E. Barlow

Research Funding: Merck (Inst), AstraZeneca (Inst)

Marie-France Savard

Honoraria: Novartis Canada Pharmaceuticals Inc, Seagen, Knight Therapeutics, Merck, Roche Canada, AstraZeneca, Pfizer, Gilead Sciences, Lilly

Consulting or Advisory Role: Pfizer, Knight Therapeutics, seagen

Mark Clemons

Travel, Accommodations, Expenses: Pfizer

Khaled El Emam

Employment: Aetion

Leadership: Canary Medical, DistillerSR

Stock and Other Ownership Interests: Canary Medical, Aetion, DistillerSR

No other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
Utility (replicability) is evaluated by comparing the results from the real data to the synthetic data.
FIG 2.
FIG 2.
The process for computing valid parameters and making inferences from synthetic data using combining rules applied to multiple generated synthetic data sets.

Similar articles

Cited by

References

    1. Ebrahim S, Sohani ZN, Montoya L, et al. : Reanalyses of randomized clinical trial data. JAMA 312:1024-1032, 2014 - PubMed
    1. Ferran J-M, Nevitt S: European Medicines Agency Policy 0070: An exploratory review of data utility in clinical study reports for academic research. BMC Med Res Methodol 19:204, 2019 - PMC - PubMed
    1. Phrma and E.F.P.I.A. : Principles for responsible clinical trial data sharing, 2013. http://www.phrma.org/sites/default/files/pdf/PhRMAPrinciplesForResponsib...
    1. E. M. Agency : European Medicines Agency Policy on publication of data for medicinal products for human use: Policy 0070, 2014. http://www.ema.europa.eu/docs/en_GB/document_library/Other/2014/10/WC500...
    1. Taichman DB, Backus J, Baethge C, et al. : Sharing clinical trial data: A proposal from the International Committee of Medical Journal Editors. Ann Intern Med 164:505-506, 2016 - PubMed

Publication types