. 2019 Jul;12(7):e005122.

doi: 10.1161/CIRCOUTCOMES.118.005122. Epub 2019 Jul 9.

Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing

Brett K Beaulieu-Jones¹, Zhiwei Steven Wu², Chris Williams³, Ran Lee⁴, Sanjeev P Bhavnani⁵, James Brian Byrd⁴, Casey S Greene³

Affiliations

¹ Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia. (B.K.B.-J.).
² Computer Science and Electrical Engineering Department, University of Minnesota, Minneapolis (Z.S.W.).
³ Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia. (C.W., C.S.G.).
⁴ Division of Cardiovascular Medicine, Department of Medicine, University of Michigan Medical School, Ann Arbor (R.L., J.B.B.).
⁵ Scripps Clinic and Research Foundation, San Diego, CA (S.P.B.).

PMID: 31284738
PMCID: PMC7041894
DOI: 10.1161/CIRCOUTCOMES.118.005122

Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing

Brett K Beaulieu-Jones et al. Circ Cardiovasc Qual Outcomes. 2019 Jul.

. 2019 Jul;12(7):e005122.

doi: 10.1161/CIRCOUTCOMES.118.005122. Epub 2019 Jul 9.

Authors

Brett K Beaulieu-Jones¹, Zhiwei Steven Wu², Chris Williams³, Ran Lee⁴, Sanjeev P Bhavnani⁵, James Brian Byrd⁴, Casey S Greene³

Affiliations

¹ Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia. (B.K.B.-J.).
² Computer Science and Electrical Engineering Department, University of Minnesota, Minneapolis (Z.S.W.).
³ Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia. (C.W., C.S.G.).
⁴ Division of Cardiovascular Medicine, Department of Medicine, University of Michigan Medical School, Ann Arbor (R.L., J.B.B.).
⁵ Scripps Clinic and Research Foundation, San Diego, CA (S.P.B.).

PMID: 31284738
PMCID: PMC7041894
DOI: 10.1161/CIRCOUTCOMES.118.005122

Abstract

Background: Data sharing accelerates scientific progress but sharing individual-level data while preserving patient privacy presents a barrier.

Methods and results: Using pairs of deep neural networks, we generated simulated, synthetic participants that closely resemble participants of the SPRINT trial (Systolic Blood Pressure Trial). We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants' data could identify a real a participant in the trial. Machine learning predictors built on the synthetic population generalize to the original data set. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data.

Conclusions: Deep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical data sets by enhancing data sharing while preserving participant privacy.

Keywords: blood pressure; deep learning; machine learning; privacy; propensity score.

PubMed Disclaimer

Figures

**Figure 1.**
**Median systolic blood pressure trajectories from initial visit to 27 mo**.

**Figure 2.**
**Pairwise Pearson correlation between columns. A**, Original and real data, (B) nonprivate and auxiliary classifier generative adversarial network (AC-GAN) simulated data, and (C) differentially private and AC-GAN simulated data (RZ, randomization visit; 1M, 1-mo visit; 2M, 2-mo visit; 3M, 3-mo visit; 6M, 6-mo visit; 9M, 9-mo visit; 12M, 12-mo visit; 15M, 15-mo visit; 18M, 18-mo visit; 21M, 21-mo visit; 24M, 24-mo visit; and 27M, 27-mo visit).

**Figure 3.**
**Clinician evaluation of synthetic data. A**, Synthetic participant scored a 2 by clinician expert. B, Synthetic participant scored a 4 by clinician expert. C, Synthetic participant scored a 6 by clinician expert. D, Synthetic participant scored an 8 by clinician expert. E, Comparison of scores between real and synthetic participant (dotted red lines indicate means). F, Distribution of scores between real (blue) and synthetic (green) patients. BP indicates blood pressure.

**Figure 4.**
**Accuracy of models trained on synthetic participants vs real data.** Line indicates performance on real data, which on average should provide the best possible performance; bar indicates performance of classifier trained on private synthetic participants; bottom of chart indicates random performance.

**Figure 5.**
**The value of delta as a function of epoch for different ε values.** An ε value of 3.5 allows for 1000 epochs of training and δ<10⁻⁵.

**Figure 6.**
**Machine learning and statistical evaluation of synthetic data. A–D**, Performance on transfer learning task by source of training data for each machine learning method. E, Pairwise Pearson correlation between columns for the original and real data. F, Pairwise Pearson correlation between columns for the private synthetic data. AUROC indicates area under the receiver operator characteristic; LR, logistic regression; RF, random forest.

See this image and copyright information in PMC

References

1. Wang R, Lagakos SW, Ware JH, Hunter DJ, Drazen JM. Statistics in medicine–reporting of subgroup analyses in clinical trials. N Engl J Med. 2007;357:2189–2194. doi: 10.1056/NEJMsr077003. - PubMed
1. Home The SPRINT Data Analysis Challenge. The SPRINT Data Analysis Challenge n.d. https://challenge.nejm.org/pages/home. Accessed December 1, 2018.
1. Wright JT, Jr, Williamson JD, Whelton PK, Snyder JK, Sink KM, Rocco MV, Reboussin DM, Rahman M, Oparil S, Lewis CE, Kimmel PL, Johnson KC, Goff DC, Jr, Fine LJ, Cutler JA, Cushman WC, Cheung AK, Ambrosius WT SPRINT Research Group. A randomized trial of intensive versus standard blood-pressure control. N Engl J Med. 2015;373:2103–2116. doi: 10.1056/NEJMoa1511939. - PMC - PubMed
1. Basu S, Sussman JB, Rigdon J, Steimle L, Denton B, Hayward R. Development and Validation of a Clinical Decision Score to Maximize Benefit and Minimize Harm from Intensive Blood Pressure Treatment 2017. https://challenge.nejm.org/posts/5815. Accessed April 19, 2019.
1. Dagan N, Tsadok MA, Hoshen M, Arkiv A, Karpati T, Gofer I, Leibowitz M, Gilutz H, Podjarny E, Bachmat E, Balicer R. To Treat Intensively or Not – Individualized Decision Making Support Tool 2017. https://challenge.nejm.org/posts/5826. Accessed April 19, 2019.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Associated data

figshare/10.6084/m9.figshare.5165737

Grants and funding

K23 HL128909/HL/NHLBI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing

Affiliations

Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources