Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data

Micah J Sheller¹, Brandon Edwards¹, G Anthony Reina¹, Jason Martin¹, Sarthak Pati^{2

3}, Aikaterini Kotrotsou^{4

5}, Mikhail Milchenko⁶, Weilin Xu¹, Daniel Marcus⁶, Rivka R Colen^{4

5

7

8}, Spyridon Bakas^{9

10

11}

Affiliations

¹ Intel Corporation, 2200 Mission College Blvd., Santa Clara, CA, 95052, USA.
² Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.
³ Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.
⁴ Department of Diagnostic Radiology, The University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX, 77030, USA.
⁵ Department of Cancer Systems Imaging, The University of Texas MD Anderson Cancer Center, 1881 East Rd, 3SCRB4, Houston, TX, 77054, USA.
⁶ Department of Radiology, Washington University School of Medicine, St. Louis, MO, 63110, USA.
⁷ Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA, 15232, USA.
⁸ Department of Radiology, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
⁹ Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA. sbakas@upenn.edu.
¹⁰ Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA. sbakas@upenn.edu.
¹¹ Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA. sbakas@upenn.edu.

PMID: 32724046
PMCID: PMC7387485
DOI: 10.1038/s41598-020-69250-1

Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data

Micah J Sheller et al. Sci Rep. 2020.

. 2020 Jul 28;10(1):12598.

doi: 10.1038/s41598-020-69250-1.

Authors

Affiliations

¹ Intel Corporation, 2200 Mission College Blvd., Santa Clara, CA, 95052, USA.
² Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.
³ Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.
⁴ Department of Diagnostic Radiology, The University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX, 77030, USA.
⁵ Department of Cancer Systems Imaging, The University of Texas MD Anderson Cancer Center, 1881 East Rd, 3SCRB4, Houston, TX, 77054, USA.
⁶ Department of Radiology, Washington University School of Medicine, St. Louis, MO, 63110, USA.
⁷ Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA, 15232, USA.
⁸ Department of Radiology, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
⁹ Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA. sbakas@upenn.edu.
¹⁰ Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA. sbakas@upenn.edu.
¹¹ Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Richards Medical Research Laboratories, Floor 7, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA. sbakas@upenn.edu.

PMID: 32724046
PMCID: PMC7387485
DOI: 10.1038/s41598-020-69250-1

Abstract

Several studies underscore the potential of deep learning in identifying complex patterns, leading to diagnostic and prognostic biomarkers. Identifying sufficiently large and diverse datasets, required for training, is a significant challenge in medicine and can rarely be found in individual institutions. Multi-institutional collaborations based on centrally-shared patient data face privacy and ownership challenges. Federated learning is a novel paradigm for data-private multi-institutional collaborations, where model-learning leverages all available data without sharing data between institutions, by distributing the model-training to the data-owners and aggregating their results. We show that federated learning among 10 institutions results in models reaching 99% of the model quality achieved with centralized data, and evaluate generalizability on data from institutions outside the federation. We further investigate the effects of data distribution across collaborating institutions on model quality and learning patterns, indicating that increased access to data through data private multi-institutional collaborations can benefit model quality more than the errors introduced by the collaborative method. Finally, we compare with other collaborative-learning approaches demonstrating the superiority of federated learning, and discuss practical implementation considerations. Clinical adoption of federated learning is expected to lead to models trained on datasets of unprecedented size, hence have a catalytic impact towards precision/personalized medicine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
*System architectures of collaborative learning approaches for multi-institutional collaborations.* The current paradigm for multi-institutional collaborations, based on Centralized Data Sharing, is shown in (a), whereas in (b) we note the proposed paradigm, based on Federated Learning. Panels (c) and (d) offer schematics for alternative data-private collaborative learning approaches evaluated in this study, namely Institutional Incremental Learning, and Cyclic Institutional Incremental Learning, respectively.

**Figure 2**
*Single Original Institution Validation Results.* Single institution mean final model qualities (based on the *Dice Similarity Coefficient*) for the *Original Institution group* (y-axis) measured against all single institution held-out validation sets (x-axis) using multiple runs of five-fold *collaborative cross validation*. The Y axis represents models trained on a single institutional dataset, and the X axis represents the validation dataset of each independent institution (Local Validation Dataset). “AVG” indicates the average of each institution mean model performance over all institutions in the group other than itself, “W-AVG” denotes the same, but with a weighted average according to each institution’s contribution to the validation set size. The diagonal entries indicate how well each institution’s final models scored against their own validation set, and they are represented as the Single Institutional Model (SIM) results reported in Fig. 3.

**Figure 3**
*Model quality results from single institution training, CDS, FL, IIL, and CIIL.* CDS, FL, CIIL mean model *Dice* against the *Original Institution* group single institution held-out validation data over multiple runs of *collaborative cross validation*, as well as the average of single institutional results under the same scheme (AVG SIM). The AVG 1–10 column provides the average performance of each collaboration method across single institution validation sets. For CIIL, ‘best local’ and ‘random local’ are two methods we introduce for final model selection during CIIL (More details are given in the “Methods: Final Model Selection” section ). Note that the color scale here differs from that used in Fig. 2.

**Figure 4**
*Learning curves of collaborative learning methods on Original Institution data.* Mean global validation *Dice* every epoch by collaborative learning method on the *Original Institution* group over multiple runs of *collaborative cross validation*. Confidence intervals are min, max. An epoch for DCS is defined as a single training pass over all of the centralized data. An epoch for FL is defined as a parallel training pass of every institutiuon over their training data, and an epoch during CIIL and IIL is defined as a single insitution training pass over its data.

See this image and copyright information in PMC

References

1. Zech JR, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLOS Med. 2018;15:e1002683. doi: 10.1371/journal.pmed.1002683. - DOI - PMC - PubMed
1. Clark K, et al. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging. 2013;26:1045–1057. doi: 10.1007/s10278-013-9622-7. - DOI - PMC - PubMed
1. Davatzikos C, et al. AI-based prognostic imaging biomarkers for precision neurooncology: the ReSPOND consortium. Neuro Oncol. 2020 doi: 10.1093/neuonc/noaa045. - DOI - PMC - PubMed
1. Menze BH, et al. The multimodal brain tumor image segmentation benchmark (BRATS) IEEE Trans. Med. Imaging. 2015;34:1993–2024. doi: 10.1109/TMI.2014.2377694. - DOI - PMC - PubMed
1. Bakas S, et al. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Nat. Sci. Data. 2017;4:170117. doi: 10.1038/sdata.2017.117. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data

Affiliations

Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources