Privacy-first health research with federated learning

Adam Sadilek¹, Luyang Liu², Dung Nguyen^{3

4}, Methun Kamruzzaman³, Stylianos Serghiou², Benjamin Rader^{5

6}, Alex Ingerman², Stefan Mellem², Peter Kairouz², Elaine O Nsoesie⁷, Jamie MacFarlane², Anil Vullikanti^{3

4}, Madhav Marathe^{3

4}, Paul Eastham², John S Brownstein^{5

8}, Blaise Aguera Y Arcas², Michael D Howell², John Hernandez⁹

Affiliations

¹ Google, Mountain View, CA, USA. adsa@google.com.
² Google, Mountain View, CA, USA.
³ Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA.
⁴ Department of Computer Science, University of Virginia, Charlottesville, VA, USA.
⁵ Computational Epidemiology Lab, Boston Children's Hospital, Boston, MA, USA.
⁶ Department of Epidemiology, Boston University, Boston, MA, USA.
⁷ Department of Global Health, Boston University, Boston, MA, USA.
⁸ Harvard Medical School, Boston, MA, USA.
⁹ Google, Mountain View, CA, USA. johnbhernandez@google.com.

PMID: 34493770
PMCID: PMC8423792
DOI: 10.1038/s41746-021-00489-2

Privacy-first health research with federated learning

Adam Sadilek et al. NPJ Digit Med. 2021.

. 2021 Sep 7;4(1):132.

doi: 10.1038/s41746-021-00489-2.

Authors

Affiliations

¹ Google, Mountain View, CA, USA. adsa@google.com.
² Google, Mountain View, CA, USA.
³ Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA.
⁴ Department of Computer Science, University of Virginia, Charlottesville, VA, USA.
⁵ Computational Epidemiology Lab, Boston Children's Hospital, Boston, MA, USA.
⁶ Department of Epidemiology, Boston University, Boston, MA, USA.
⁷ Department of Global Health, Boston University, Boston, MA, USA.
⁸ Harvard Medical School, Boston, MA, USA.
⁹ Google, Mountain View, CA, USA. johnbhernandez@google.com.

PMID: 34493770
PMCID: PMC8423792
DOI: 10.1038/s41746-021-00489-2

Abstract

Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models that are trained in a distributed fashion. These techniques facilitate the calculation of research study endpoints such that private data never leaves a given device or healthcare system. We show-on a diverse set of single and multi-site health studies-that federated models can achieve similar accuracy, precision, and generalizability, and lead to the same interpretation as standard centralized statistical models while achieving considerably stronger privacy protections and without significantly raising computational costs. This work is the first to apply modern and general federated learning methods that explicitly incorporate differential privacy to clinical and epidemiological research-across a spectrum of units of federation, model architectures, complexity of learning tasks and diseases. As a result, it enables health research participants to remain in control of their data and still contribute to advancing science-aspects that used to be at odds with each other.

PubMed Disclaimer

Conflict of interest statement

A.S., L.L., A.I., S.M., P.K., J.M., P.E., M.H., B.A., S.S. and J.H. are employees of Google and own Alphabet stock. The remaining authors declare no competing interests.

Figures

**Fig. 1. Area under the ROC curve (AUC) as a function of fraction of participants in each federated (server) round of learning for replicated model of SARS-CoV-2 and Cancer.**
Shown in log scale to highlight details at the low participation levels. We see that even at a 2% participation rate, the model still achieves 99% of the maximum attainable AUC. We observe this pattern across all the datasets studied. 80% of the whole dataset was used to train the model and the rest 20% used for validation.

**Fig. 2. Receiver operating characteristic curves for the three learning setups on MIMIC-III data predicting inpatient mortality.**
Shaded areas show 95% confidence intervals.

**Fig. 3. The odds ratio other than red color are generated using our models.**
The odds ratio generated by our models are consistent with the odds ratio of the original study. The vertical bar along with each coefficient shows 95% confidence level of corresponding ratio.

**Fig. 4. The estimated coefficients of Statsmodels (GLM), TF-Centralized (Tensorflow Probability) and TF-Fed-Patient (Tensorflow Probability with Federated Learning, using patient as the unit).**
The plots show the coefficients and their 95 %confidence intervals of nine variables of different univariate logistic regression models. The significance of all models and variables is almost consistent with the original study: eight over nine variables have the same conclusions and only one (Acquisition status) does not (TF-Centralized and TF-Fed-Patient both show it is significant, while GLM and the original study state otherwise). In the original study, the variable has a p value of 0.06 which lies near the borderline of significance (p ≤ 0.05).

See this image and copyright information in PMC

References

1. Zhu W, Kairouz P, Sun H, McMahan B, Li W. Federated heavy hitters with differential privacy. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. PMLR. 2020;108:3837–3847.
1. Hanzely, Filip, et al. Lower Bounds and Optimal Algorithms for Personalized Federated Learning. Advances in Neural Information Processing Systems 33 (2020).
1. Sheller MJ, et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 2020;10:12598. doi: 10.1038/s41598-020-69250-1. - DOI - PMC - PubMed
1. Vaid, Akhil, et al. Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients. Ethnicity 52.77.6: 0-001.
1. Choudhury O, et al. Predicting adverse drug reactions on distributed health data using federated learning. AMIA Annu. Symp. Proc. 2020;2019:313–322. - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Privacy-first health research with federated learning

Affiliations

Privacy-first health research with federated learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials