Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 8;16(1):6274.
doi: 10.1038/s41467-025-60466-1.

Towards fair decentralized benchmarking of healthcare AI algorithms with the Federated Tumor Segmentation (FeTS) challenge

Maximilian Zenk #  1   2 Ujjwal Baid #  3   4 Sarthak Pati  3   4   5 Akis Linardos  3   4 Brandon Edwards  6 Micah Sheller  5   6 Patrick Foley  6 Alejandro Aristizabal  5   7 David Zimmerer  1 Alexey Gruzdev  6 Jason Martin  6 Russell T Shinohara  8   9   10 Annika Reinke  11   12 Fabian Isensee  1   12 Santhosh Parampottupadam  1 Kaushal Parekh  1 Ralf Floca  1 Hasan Kassem  5 Bhakti Baheti  4 Siddhesh Thakur  4 Verena Chung  13 Kaisar Kushibar  14 Karim Lekadir  15   16 Meirui Jiang  17 Youtan Yin  18 Hongzheng Yang  19 Quande Liu  17 Cheng Chen  17 Qi Dou  17 Pheng-Ann Heng  17 Xiaofan Zhang  20 Shaoting Zhang  21 Muhammad Irfan Khan  22 Mohammad Ayyaz Azeem  23 Mojtaba Jafaritadi  22   24 Esa Alhoniemi  22 Elina Kontio  22 Suleiman A Khan  25 Leon Mächler  26 Ivan Ezhov  27   28 Florian Kofler  27   28   29   30 Suprosanna Shit  27   28   30 Johannes C Paetzold  31   32 Timo Loehr  27   28 Benedikt Wiestler  29 Himashi Peiris  33   34   35 Kamlesh Pawar  33   36 Shenjun Zhong  33   37 Zhaolin Chen  33   35 Munawar Hayat  35 Gary Egan  33   36 Mehrtash Harandi  34 Ece Isik Polat  38 Gorkem Polat  38 Altan Kocyigit  38 Alptekin Temizel  38 Anup Tuladhar  39   40 Lakshay Tyagi  41 Raissa Souza  39   40   42 Nils D Forkert  39   40   43   44 Pauline Mouches  39   40   42 Matthias Wilms  39   40 Vishruth Shambhat  45 Akansh Maurya  46 Shubham Subhas Danannavar  45 Rohit Kalla  45 Vikas Kumar Anand  45 Ganapathy Krishnamurthi  45 Sahil Nalawade  47 Chandan Ganesh  47 Ben Wagner  47 Divya Reddy  47 Yudhajit Das  47 Fang F Yu  47 Baowei Fei  48 Ananth J Madhuranthakam  47 Joseph Maldjian  47 Gaurav Singh  49 Jianxun Ren  50 Wei Zhang  50 Ning An  50 Qingyu Hu  51 Youjia Zhang  50 Ying Zhou  50 Vasilis Siomos  52 Giacomo Tarroni  52   53 Jonathan Passerrat-Palmbach  52   53 Ambrish Rawat  54 Giulio Zizzo  54 Swanand Ravindra Kadhe  54 Jonathan P Epperlein  54 Stefano Braghin  54 Yuan Wang  55 Renuga Kanagavelu  55 Qingsong Wei  55 Yechao Yang  55 Yong Liu  55 Krzysztof Kotowski  56 Szymon Adamski  56 Bartosz Machura  56 Wojciech Malara  56 Lukasz Zarudzki  57 Jakub Nalepa  56   58 Yaying Shi  59   60 Hongjian Gao  61 Salman Avestimehr  61 Yonghong Yan  59 Agus S Akbar  62 Ekaterina Kondrateva  63 Hua Yang  64 Zhaopei Li  65 Hung-Yu Wu  66 Johannes Roth  67 Camillo Saueressig  68 Alexandre Milesi  69 Quoc D Nguyen  70 Nathan J Gruenhagen  71 Tsung-Ming Huang  72 Jun Ma  73 Har Shwinder H Singh  74 Nai-Yu Pan  75 Dingwen Zhang  76 Ramy A Zeineldin  77 Michal Futrega  69 Yading Yuan  78   79 Gian Marco Conte  80 Xue Feng  81 Quan D Pham  82 Yong Xia  83 Zhifan Jiang  84 Huan Minh Luu  85 Mariia Dobko  86 Alexandre Carré  87 Bair Tuchinov  88 Hassan Mohy-Ud-Din  89 Saruar Alam  90 Anup Singh  91 Nameeta Shah  92 Weichung Wang  93 Chiharu Sako  8   94 Michel Bilello  8   94 Satyam Ghodasara  94 Suyash Mohan  8   94 Christos Davatzikos  8   94 Evan Calabrese  95 Jeffrey Rudie  95 Javier Villanueva-Meyer  95 Soonmee Cha  95 Christopher Hess  95 John Mongan  95 Madhura Ingalhalikar  96 Manali Jadhav  96 Umang Pandey  96 Jitender Saini  97 Raymond Y Huang  98 Ken Chang  99 Minh-Son To  100   101 Sargam Bhardwaj  100 Chee Chong  101 Marc Agzarian  100   101 Michal Kozubek  102 Filip Lux  102 Jan Michálek  102 Petr Matula  102 Miloš Ker Kovský  103 Tereza Kopr Ivová  103 Marek Dostál  103   104 Václav Vybíhal  105 Marco C Pinho  47 James Holcomb  47 Marie Metz  106 Rajan Jain  107   108 Matthew D Lee  107 Yvonne W Lui  107 Pallavi Tiwari  109   110 Ruchika Verma  111   112   113 Rohan Bareja  111 Ipsa Yadav  111 Jonathan Chen  111 Neeraj Kumar  114   115   116 Yuriy Gusev  117 Krithika Bhuvaneshwar  117 Anousheh Sayah  118 Camelia Bencheqroun  117 Anas Belouali  117 Subha Madhavan  117 Rivka R Colen  119   120 Aikaterini Kotrotsou  120 Philipp Vollmuth  1   121   122 Gianluca Brugnara  123 Chandrakanth J Preetha  123 Felix Sahm  124   125 Martin Bendszus  123 Wolfgang Wick  2   126 Abhishek Mahajan  127   128 Carmen Balaña  129 Jaume Capellades  130 Josep Puig  131 Yoon Seong Choi  132 Seung-Koo Lee  133 Jong Hee Chang  133 Sung Soo Ahn  133 Hassan F Shaykh  134 Alejandro Herrera-Trujillo  135   136 Maria Trujillo  136 William Escobar  135 Ana Abello  136 Jose Bernal  137   138   139 Jhon Gómez  136 Pamela LaMontagne  140 Daniel S Marcus  140 Mikhail Milchenko  140   141 Arash Nazeri  140 Bennett Landman  142 Karthik Ramadass  142 Kaiwen Xu  143 Silky Chotai  144 Lola B Chambless  144 Akshitkumar Mistry  144 Reid C Thompson  144 Ashok Srinivasan  145 J Rajiv Bapuraj  145 Arvind Rao  146 Nicholas Wang  146 Ota Yoshiaki  145 Toshio Moritani  145 Sevcan Turk  145 Joonsang Lee  146 Snehal Prabhudesai  146 John Garrett  147   148 Matthew Larson  147 Robert Jeraj  148 Hongwei Li  30 Tobias Weiss  149 Michael Weller  149 Andrea Bink  150 Bertrand Pouymayou  150 Sonam Sharma  151 Tzu-Chi Tseng  151 Saba Adabi  151 Alexandre Xavier Falcão  152 Samuel B Martins  153 Bernardo C A Teixeira  154   155 Flávia Sprenger  155 David Menotti  156 Diego R Lucio  156 Simone P Niclou  157   158 Olivier Keunen  159 Ann-Christin Hau  157   160 Enrique Pelaez  161 Heydy Franco-Maldonado  162 Francis Loayza  161 Sebastian Quevedo  163 Richard McKinley  164 Johannes Slotboom  164 Piotr Radojewski  164 Raphael Meier  164 Roland Wiest  164   165 Johannes Trenkler  166 Josef Pichler  167 Georg Necker  166 Andreas Haunschmidt  166 Stephan Meckel  166   168 Pamela Guevara  169 Esteban Torche  169 Cristobal Mendoza  169 Franco Vera  169 Elvis Ríos  169 Eduardo López  169 Sergio A Velastin  170   171 Joseph Choi  172 Stephen Baek  173 Yusung Kim  174 Heba Ismael  174 Bryan Allen  174 John M Buatti  174 Peter Zampakis  175 Vasileios Panagiotopoulos  176 Panagiotis Tsiganos  177 Sotiris Alexiou  178 Ilias Haliassos  179 Evangelia I Zacharaki  178 Konstantinos Moustakas  178 Christina Kalogeropoulou  175 Dimitrios M Kardamakis  179 Bing Luo  180 Laila M Poisson  181 Ning Wen  180 Martin Vallières  182   183 Mahdi Ait Lhaj Loutfi  182 David Fortin  184 Martin Lepage  185 Fanny Morón  186 Jacob Mandel  187 Gaurav Shukla  8   188   189 Spencer Liem  190 Gregory S Alexandre  190   191 Joseph Lombardo  189   190 Joshua D Palmer  192 Adam E Flanders  193 Adam P Dicker  189 Godwin Ogbole  194 Dotun Oyekunle  194 Olubunmi Odafe-Oyibotha  195 Babatunde Osobu  194 Mustapha Shu'aibu Hikima  196 Mayowa Soneye  194 Farouk Dako  94 Adeleye Dorcas  197 Derrick Murcia  198 Eric Fu  198 Rourke Haas  198 John A Thompson  199 David Ryan Ormond  198 Stuart Currie  200 Kavi Fatania  200 Russell Frood  200 Amber L Simpson  201   202 Jacob J Peoples  201 Ricky Hu  201   202 Danielle Cutler  201   202   203   204 Fabio Y Moraes  205 Anh Tran  201   202 Mohammad Hamghalam  201   206 Michael A Boss  207 James Gimpel  207 Deepak Kattil Veettil  208 Kendall Schmidt  208 Lisa Cimino  208 Cynthia Price  208 Brian Bialecki  208 Sailaja Marella  208 Charles Apgar  207 Andras Jakab  209 Marc-André Weber  210 Errol Colak  211 Jens Kleesiek  212 John B Freymann  213 Justin S Kirby  213 Lena Maier-Hein  11 Jake Albrecht  13 Peter Mattson  5 Alexandros Karargyris  5 Prashant Shah  6 Bjoern Menze  27   28   30 Klaus Maier-Hein  1   2   12   214 Spyridon Bakas  215   216   217   218   219   220   221
Affiliations

Towards fair decentralized benchmarking of healthcare AI algorithms with the Federated Tumor Segmentation (FeTS) challenge

Maximilian Zenk et al. Nat Commun. .

Abstract

Computational competitions are the standard for benchmarking medical image analysis algorithms, but they typically use small curated test datasets acquired at a few centers, leaving a gap to the reality of diverse multicentric patient data. To this end, the Federated Tumor Segmentation (FeTS) Challenge represents the paradigm for real-world algorithmic performance evaluation. The FeTS challenge is a competition to benchmark (i) federated learning aggregation algorithms and (ii) state-of-the-art segmentation algorithms, across multiple international sites. Weight aggregation and client selection techniques were compared using a multicentric brain tumor dataset in realistic federated learning simulations, yielding benefits for adaptive weight aggregation, and efficiency gains through client sampling. Quantitative performance evaluation of state-of-the-art segmentation algorithms on data distributed internationally across 32 institutions yielded good generalization on average, albeit the worst-case performance revealed data-specific modes of failure. Similar multi-site setups can help validate the real-world utility of healthcare AI algorithms in the future.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The Intel-affiliated authors (B. Edwards, M. Sheller, P. Foley, A. Gruzdev, J. Martin, P. Shah) would like to disclose the following (potential) competing interests as Intel employees. Intel may develop proprietary software that is related in reputation to the OpenFL open source project highlighted in this work. In addition, the work demonstrates feasibility of federated learning for brain tumor boundary detection models. Intel may benefit by selling products to support an increase in demand for this use-case. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Concept and main findings of the Federated Tumor Segmentation (FeTS) Challenge.
The FeTS challenge is an international competition to benchmark brain tumor segmentation algorithms, involving data contributors, participants, and organizers across the globe. Test data hubs are geographically distributed while training data is centralized. Participants include those from the 2021 and 2022 challenges. Task 1 focused on simulated federated learning and we consistently saw an increase in performance by teams utilizing variants of selective sampling in their federated aggregation. In Task 2, submissions are distributed among the test data hubs for evaluation. As a representative example, the top-ranked model shows good average segmentation performance (measured by the Dice Similarity coefficient, DSC) but also failures for individual cases. Cases with empty tumor regions and data sites with less than 40 cases are not shown in the strip plot. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Aggregated results of challenge Task 2 per institution and model.
The figure visualizes test set sizes (left bar plot), mean DSC scores for each institution and submitted model (heatmap; the mean is taken over all test cases and three tumor regions), and mean DSC scores averaged per model (top bar plot). Models are ordered by mean DSC score and official FeTS2022 submissions are marked with ticks. White, crossed out tiles indicate evaluations that could not be completed. The heatmap shows that the performances of the top models are close within each row (i.e., institution) and vary much more between rows. While the drops in mean DSC are moderate, they show that state-of-the-art segmentation algorithms fail to provide the highest segmentation quality for some institutions. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Performance of the top-ranked algorithm for each institution of the test set (Task 2).
Some institutions contributed distinct patients to both the training and testing dataset (marked as seen during training), while others were unseen before testing. Each gray dot represents the mean DSC score over three tumor regions for a single test case, while box plots indicate the median (middle line), 25th, 75th percentile (box) and samples within 1.5 × interquartile range (whiskers) of the distribution. The number of samples n per institution is given above each box. Although median DSC scores are mostly higher than 0.9, institutions with reduced performance or outlier cases exist both within the subset seen during training and the unseen subset. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Qualitative examples of common segmentation issues.
Each row shows one case with four MR sequences (T1, T1-Gd, T2, T2-FLAIR) and a segmentation mask overlay in the rightmost column. a, b depict errors in the test set prediction of the top-ranked model (ID: 15), while (c, d) show training set examples with reference segmentation issues (c, d). a False positive edema prediction. The hyperintensity is not due to the tumor but a different, symmetric pathology, which is distant from the tumor. b A small contrast enhancement is missed by the top-ranked model. It is separate from the larger tumor in the lower right but should be labeled as ET. c Since blood products are bright in T1 and T1-Gd, they can be confused with ET. d The segmentation of non-enhancing tumor core parts is difficult and often differs between annotators. Label abbreviations: ED edema, NC necrotic tumor core, ET enhancing tumor.

References

    1. Ostrom, Q. T. et al. Cbtrus statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2016–2020. Neuro-Oncol.25, iv1–iv99 (2023). - PMC - PubMed
    1. Pati, S. et al. Reproducibility analysis of multi-institutional paired expert annotations and radiomic features of the ivy glioblastoma atlas project (ivy gap) dataset. Med. Phys.47, 6039–6052 (2020). - PMC - PubMed
    1. Baid, U. et al. The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. Preprint at http://arxiv.org/abs/2107.02314 (2021).
    1. Bakas, S. et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the Brats challenge. Preprint at https://arxiv.org/abs/1811.02629 (2018).
    1. Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans. Med. Imaging34, 1993–2024 (2014). - PMC - PubMed