Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 5;13(1):7346.
doi: 10.1038/s41467-022-33407-5.

Federated learning enables big data for rare cancer boundary detection

Sarthak Pati #  1   2   3   4 Ujjwal Baid #  1   2   3 Brandon Edwards #  5 Micah Sheller  5 Shih-Han Wang  5 G Anthony Reina  5 Patrick Foley  5 Alexey Gruzdev  5 Deepthi Karkada  5 Christos Davatzikos  1   2 Chiharu Sako  1   2 Satyam Ghodasara  2 Michel Bilello  1   2 Suyash Mohan  1   2 Philipp Vollmuth  6 Gianluca Brugnara  6 Chandrakanth J Preetha  6 Felix Sahm  7   8 Klaus Maier-Hein  9   10 Maximilian Zenk  9 Martin Bendszus  6 Wolfgang Wick  7   11 Evan Calabrese  12 Jeffrey Rudie  12 Javier Villanueva-Meyer  12 Soonmee Cha  12 Madhura Ingalhalikar  13 Manali Jadhav  13 Umang Pandey  13 Jitender Saini  14 John Garrett  15   16 Matthew Larson  15 Robert Jeraj  15   16 Stuart Currie  17 Russell Frood  17 Kavi Fatania  17 Raymond Y Huang  18 Ken Chang  19 Carmen Balaña  20 Jaume Capellades  21 Josep Puig  22 Johannes Trenkler  23 Josef Pichler  24 Georg Necker  23 Andreas Haunschmidt  23 Stephan Meckel  23   25 Gaurav Shukla  1   26 Spencer Liem  27 Gregory S Alexander  28 Joseph Lombardo  27   29 Joshua D Palmer  30 Adam E Flanders  31 Adam P Dicker  29 Haris I Sair  32   33 Craig K Jones  33 Archana Venkataraman  34 Meirui Jiang  35 Tiffany Y So  35 Cheng Chen  35 Pheng Ann Heng  35 Qi Dou  35 Michal Kozubek  36 Filip Lux  36 Jan Michálek  36 Petr Matula  36 Miloš Keřkovský  37 Tereza Kopřivová  37 Marek Dostál  37   38 Václav Vybíhal  39 Michael A Vogelbaum  40 J Ross Mitchell  41   42 Joaquim Farinhas  43 Joseph A Maldjian  44 Chandan Ganesh Bangalore Yogananda  44 Marco C Pinho  44 Divya Reddy  44 James Holcomb  44 Benjamin C Wagner  44 Benjamin M Ellingson  45   46 Timothy F Cloughesy  46 Catalina Raymond  45 Talia Oughourlian  45   47 Akifumi Hagiwara  47 Chencai Wang  47 Minh-Son To  48   49 Sargam Bhardwaj  48 Chee Chong  50 Marc Agzarian  50   51 Alexandre Xavier Falcão  52 Samuel B Martins  53 Bernardo C A Teixeira  54   55 Flávia Sprenger  55 David Menotti  56 Diego R Lucio  56 Pamela LaMontagne  57 Daniel Marcus  57 Benedikt Wiestler  58   59 Florian Kofler  58   59   60 Ivan Ezhov  4   59   60 Marie Metz  58 Rajan Jain  61   62 Matthew Lee  61 Yvonne W Lui  61 Richard McKinley  63 Johannes Slotboom  63 Piotr Radojewski  63 Raphael Meier  63 Roland Wiest  63 Derrick Murcia  64 Eric Fu  64 Rourke Haas  64 John Thompson  64 David Ryan Ormond  64 Chaitra Badve  65 Andrew E Sloan  66   67   68 Vachan Vadmal  68 Kristin Waite  69 Rivka R Colen  70   71 Linmin Pei  72 Murat Ak  70 Ashok Srinivasan  73 J Rajiv Bapuraj  73 Arvind Rao  74 Nicholas Wang  74 Ota Yoshiaki  73 Toshio Moritani  73 Sevcan Turk  73 Joonsang Lee  74 Snehal Prabhudesai  74 Fanny Morón  75 Jacob Mandel  51 Konstantinos Kamnitsas  76   77 Ben Glocker  76 Luke V M Dixon  78 Matthew Williams  79 Peter Zampakis  80 Vasileios Panagiotopoulos  81 Panagiotis Tsiganos  82 Sotiris Alexiou  83 Ilias Haliassos  84 Evangelia I Zacharaki  83 Konstantinos Moustakas  83 Christina Kalogeropoulou  80 Dimitrios M Kardamakis  85 Yoon Seong Choi  86 Seung-Koo Lee  86 Jong Hee Chang  86 Sung Soo Ahn  86 Bing Luo  87 Laila Poisson  88 Ning Wen  87   89 Pallavi Tiwari  90 Ruchika Verma  42   90 Rohan Bareja  90 Ipsa Yadav  90 Jonathan Chen  90 Neeraj Kumar  41   42 Marion Smits  91 Sebastian R van der Voort  91 Ahmed Alafandi  91 Fatih Incekara  91   92 Maarten M J Wijnenga  93 Georgios Kapsas  91 Renske Gahrmann  91 Joost W Schouten  92 Hendrikus J Dubbink  94 Arnaud J P E Vincent  92 Martin J van den Bent  93 Pim J French  93 Stefan Klein  95 Yading Yuan  96 Sonam Sharma  96 Tzu-Chi Tseng  96 Saba Adabi  96 Simone P Niclou  97 Olivier Keunen  98 Ann-Christin Hau  97   99 Martin Vallières  100   101 David Fortin  101   102 Martin Lepage  101   103 Bennett Landman  104 Karthik Ramadass  104 Kaiwen Xu  105 Silky Chotai  106 Lola B Chambless  106 Akshitkumar Mistry  106 Reid C Thompson  106 Yuriy Gusev  107 Krithika Bhuvaneshwar  107 Anousheh Sayah  108 Camelia Bencheqroun  107 Anas Belouali  107 Subha Madhavan  107 Thomas C Booth  109   110 Alysha Chelliah  109 Marc Modat  109 Haris Shuaib  111   112 Carmen Dragos  111 Aly Abayazeed  113 Kenneth Kolodziej  113 Michael Hill  113 Ahmed Abbassy  114 Shady Gamal  114 Mahmoud Mekhaimar  114 Mohamed Qayati  114 Mauricio Reyes  115 Ji Eun Park  116 Jihye Yun  116 Ho Sung Kim  116 Abhishek Mahajan  117 Mark Muzi  118 Sean Benson  119 Regina G H Beets-Tan  120   121 Jonas Teuwen  119 Alejandro Herrera-Trujillo  122   123 Maria Trujillo  123 William Escobar  122   123 Ana Abello  123 Jose Bernal  123   124 Jhon Gómez  123 Joseph Choi  125 Stephen Baek  126 Yusung Kim  127 Heba Ismael  127 Bryan Allen  127 John M Buatti  127 Aikaterini Kotrotsou  128 Hongwei Li  129 Tobias Weiss  130 Michael Weller  130 Andrea Bink  131 Bertrand Pouymayou  131 Hassan F Shaykh  132 Joel Saltz  133 Prateek Prasanna  133 Sampurna Shrestha  133 Kartik M Mani  133   134 David Payne  135 Tahsin Kurc  133   136 Enrique Pelaez  137 Heydy Franco-Maldonado  138 Francis Loayza  137 Sebastian Quevedo  139 Pamela Guevara  140 Esteban Torche  140 Cristobal Mendoza  140 Franco Vera  140 Elvis Ríos  140 Eduardo López  140 Sergio A Velastin  141 Godwin Ogbole  142 Mayowa Soneye  142 Dotun Oyekunle  142 Olubunmi Odafe-Oyibotha  143 Babatunde Osobu  142 Mustapha Shu'aibu  144 Adeleye Dorcas  145 Farouk Dako  2   146 Amber L Simpson  112   147 Mohammad Hamghalam  147   148 Jacob J Peoples  147 Ricky Hu  147 Anh Tran  147 Danielle Cutler  149 Fabio Y Moraes  150 Michael A Boss  151 James Gimpel  151 Deepak Kattil Veettil  151 Kendall Schmidt  152 Brian Bialecki  152 Sailaja Marella  151 Cynthia Price  151 Lisa Cimino  151 Charles Apgar  151 Prashant Shah  5 Bjoern Menze  4   129 Jill S Barnholtz-Sloan  69   153 Jason Martin  5 Spyridon Bakas  154   155   156
Affiliations

Federated learning enables big data for rare cancer boundary detection

Sarthak Pati et al. Nat Commun. .

Erratum in

  • Author Correction: Federated learning enables big data for rare cancer boundary detection.
    Pati S, Baid U, Edwards B, Sheller M, Wang SH, Reina GA, Foley P, Gruzdev A, Karkada D, Davatzikos C, Sako C, Ghodasara S, Bilello M, Mohan S, Vollmuth P, Brugnara G, Preetha CJ, Sahm F, Maier-Hein K, Zenk M, Bendszus M, Wick W, Calabrese E, Rudie J, Villanueva-Meyer J, Cha S, Ingalhalikar M, Jadhav M, Pandey U, Saini J, Garrett J, Larson M, Jeraj R, Currie S, Frood R, Fatania K, Huang RY, Chang K, Balaña C, Capellades J, Puig J, Trenkler J, Pichler J, Necker G, Haunschmidt A, Meckel S, Shukla G, Liem S, Alexander GS, Lombardo J, Palmer JD, Flanders AE, Dicker AP, Sair HI, Jones CK, Venkataraman A, Jiang M, So TY, Chen C, Heng PA, Dou Q, Kozubek M, Lux F, Michálek J, Matula P, Keřkovský M, Kopřivová T, Dostál M, Vybíhal V, Vogelbaum MA, Mitchell JR, Farinhas J, Maldjian JA, Yogananda CGB, Pinho MC, Reddy D, Holcomb J, Wagner BC, Ellingson BM, Cloughesy TF, Raymond C, Oughourlian T, Hagiwara A, Wang C, To MS, Bhardwaj S, Chong C, Agzarian M, Falcão AX, Martins SB, Teixeira BCA, Sprenger F, Menotti D, Lucio DR, LaMontagne P, Marcus D, Wiestler B, Kofler F, Ezhov I, Metz M, Jain R, Lee M, Lui YW, McKinley R, Slotboom J, Radojewski P, Meier R, Wiest R, Murcia D, Fu E, Haas R, Thompson… See abstract for full author list ➔ Pati S, et al. Nat Commun. 2023 Jan 26;14(1):436. doi: 10.1038/s41467-023-36188-7. Nat Commun. 2023. PMID: 36702828 Free PMC article. No abstract available.

Abstract

Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing.

PubMed Disclaimer

Conflict of interest statement

The Intel-affiliated authors (B. Edwards, M. Sheller, S. Wang, G.A. Reina, P. Foley, A. Gruzdev, D. Karkada, P. Shah, J. Martin) would like to disclose the following (potential) competing interests as Intel employees. Intel may develop proprietary software that is related in reputation to the OpenFL open source project highlighted in this work. In addition, the work demonstrates feasibility of federated learning for brain tumor boundary detection models. Intel may benefit by selling products to support an increase in demand for this use-case. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Representation of the study’s global scale, diversity, and complexity.
a The map of all sites involved in the development of FL consensus model. b Example of a glioblastoma mpMRI scan with corresponding reference annotations of the tumor sub-compartments (ET enhancing tumor, TC tumor core, WT whole tumor). c, d Comparative Dice similarity coefficient (DSC) performance evaluation of the final consensus model with the public initial model on the collaborators' local validation data (in c with n = 1043 biologically independent cases) and on the complete out-of-sample data (in d with n = 518 biologically independent cases), per tumor sub-compartment (ET enhancing tumor, TC tumor core, WT whole tumor). Note the box and whiskers inside each violin plot represent the true min and max values. The top and bottom of each “box” depict the 3rd and 1st quartile of each measure. The white line and the red ‘×’, within each box, indicate the median and mean values, respectively. The fact that these are not necessarily at the center of each box indicates the skewness of the distribution over different cases. The “whiskers” drawn above and below each box depict the extremal observations still within 1.5 times the interquartile range, above the 3rd or below the 1st quartile. Equivalent plots for the Jaccard similarity coefficient (JSC) can be observed in supplementary figures. e Number of contributed cases per collaborating site.
Fig. 2
Fig. 2. Generalizable Dice similarity coefficient (DSC) evaluation on ‘centralized’ out-of-sample data (n = 154 biologically independent cases), per tumor sub-compartment (ET enhancing tumor, TC tumor core, WT whole tumor) and averaged across cases.
Comparative performance evaluation across the public initial model, the preliminary consensus model, the final consensus model, and an ensemble of single site models from collaborators holding > 200 cases. Note the box and whiskers inside each violin plot, represent the true min and max values. The top and bottom of each “box” depict the 3rd and 1st quartile of each measure. The white line and the red ‘×’, within each box, indicate the median and mean values, respectively. The fact that these are not necessarily at the center of each box indicates the skewness of the distribution over different cases. The "whiskers'' drawn above and below each box depict the extremal observations still within 1.5 times the interquartile range, above the 3rd or below the 1st quartile. Equivalent plots for Jaccard similarity coefficient (JSC) can be observed in supplementary figures.
Fig. 3
Fig. 3. Per-tumor region (ET enhancing tumor, TC tumor core, WT whole tumor) mean Dice similarity coefficient (DSC) over validation samples (with shading indicating 95% confidence intervals again over samples).
a At all participating sites across training rounds showing that the score is greater for sub-compartments with larger volumes. b For a site with problematic annotations (Site 48). The instability in these curves could be caused by errors in annotation for the local validation data (similar to errors that were observed for a small shared sample of data from this site). c Provides an example of a case with erroneous annotations in the data used by Site 48. Equivalent plots for Jaccard similarity coefficient (JSC) can be observed in supplementary figures.

References

    1. Mårtensson G, et al. The reliability of a deep learning model in clinical out-of-distribution MRI data: a multicohort study. Med. Image Anal. 2020;66:101714. - PubMed
    1. Zech JR, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018;15:e1002683. - PMC - PubMed
    1. Obermeyer Z, Emanuel EJ. Predicting the future-big data, machine learning, and clinical medicine. New Engl. J. Med. 2016;375:1216. - PMC - PubMed
    1. Marcus, G. Deep learning: a critical appraisal. arXiv preprint arXiv:1801.00631 (2018).
    1. Aggarwal, C. C. et al. Neural Networks and Deep Learning Vol. 10, 978–983 (Springer, 2018).

Publication types