Privacy-preserving multicenter differential protein abundance analysis with FedProt

Yuliya Burankova^{1

2}, Miriam Abele^{3

4}, Mohammad Bakhtiari⁵, Christine von Toerne⁶, Teresa K Barth⁷, Lisa Schweizer⁸, Pieter Giesbertz⁹, Johannes R Schmidt¹⁰, Stefan Kalkhof^{10

11}, Janina Müller-Deile¹², Peter A van Veelen¹³, Yassene Mohammed¹³, Elke Hammer^{14

15}, Lis Arend^{5

16}, Klaudia Adamowicz⁵, Tanja Laske^{5

17}, Anne Hartebrodt^{18

19}, Tobias Frisch¹⁸, Chen Meng⁴, Julian Matschinske⁵, Julian Späth⁵, Richard Röttger¹⁸, Veit Schwämmle²⁰, Stefanie M Hauck⁶, Stefan F Lichtenthaler^{9

21

22}, Axel Imhof⁷, Matthias Mann⁸, Christina Ludwig⁴, Bernhard Kuster³, Jan Baumbach^{5

18}, Olga Zolotareva^{5

16}

Affiliations

¹ Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany. yuliya.burankova@tum.de.
² Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany. yuliya.burankova@tum.de.
³ Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
⁴ Bavarian Center for Biomolecular Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
⁵ Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.
⁶ Metabolomics and Proteomics Core, Helmholtz Center Munich, Munich, Germany.
⁷ Protein Analysis Unit, Biomedical Center, Faculty of Medicine, LMU Munich, Martinsried, Germany.
⁸ Max Planck Institute of Biochemistry, Martinsried, Germany.
⁹ German Center for Neurodegenerative Diseases (DZNE), Munich, Germany.
¹⁰ Department of Preclinical Development and Validation, Fraunhofer Institute for Cell Therapy and Immunology IZI, Leipzig, Germany.
¹¹ Institute for Bioanalysis, University of Applied Science Coburg, Coburg, Germany.
¹² Department of Nephrology, Uniklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
¹³ Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, the Netherlands.
¹⁴ University Medicine Greifswald, Greifswald, Germany.
¹⁵ German Center for Cardiovascular Diseases (DZHK), Greifswald, Germany.
¹⁶ Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
¹⁷ Viral Systems Modeling, Leibniz Institute of Virology, Hamburg, Germany.
¹⁸ Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
¹⁹ Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
²⁰ Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark.
²¹ Neuroproteomics, School of Medicine and Health, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany.
²² Munich Cluster for Systems Neurology (SyNergy), Munich, Germany.

PMID: 40646319
PMCID: PMC12374843
DOI: 10.1038/s43588-025-00832-7

Privacy-preserving multicenter differential protein abundance analysis with FedProt

Yuliya Burankova et al. Nat Comput Sci. 2025 Aug.

. 2025 Aug;5(8):675-688.

doi: 10.1038/s43588-025-00832-7. Epub 2025 Jul 11.

Authors

Affiliations

¹ Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany. yuliya.burankova@tum.de.
² Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany. yuliya.burankova@tum.de.
³ Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
⁴ Bavarian Center for Biomolecular Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
⁵ Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.
⁶ Metabolomics and Proteomics Core, Helmholtz Center Munich, Munich, Germany.
⁷ Protein Analysis Unit, Biomedical Center, Faculty of Medicine, LMU Munich, Martinsried, Germany.
⁸ Max Planck Institute of Biochemistry, Martinsried, Germany.
⁹ German Center for Neurodegenerative Diseases (DZNE), Munich, Germany.
¹⁰ Department of Preclinical Development and Validation, Fraunhofer Institute for Cell Therapy and Immunology IZI, Leipzig, Germany.
¹¹ Institute for Bioanalysis, University of Applied Science Coburg, Coburg, Germany.
¹² Department of Nephrology, Uniklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
¹³ Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, the Netherlands.
¹⁴ University Medicine Greifswald, Greifswald, Germany.
¹⁵ German Center for Cardiovascular Diseases (DZHK), Greifswald, Germany.
¹⁶ Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
¹⁷ Viral Systems Modeling, Leibniz Institute of Virology, Hamburg, Germany.
¹⁸ Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
¹⁹ Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
²⁰ Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark.
²¹ Neuroproteomics, School of Medicine and Health, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany.
²² Munich Cluster for Systems Neurology (SyNergy), Munich, Germany.

PMID: 40646319
PMCID: PMC12374843
DOI: 10.1038/s43588-025-00832-7

Abstract

Quantitative mass spectrometry has revolutionized proteomics by enabling simultaneous quantification of thousands of proteins. Pooling patient-derived data from multiple institutions enhances statistical power but raises serious privacy concerns. Here we introduce FedProt, the first privacy-preserving tool for collaborative differential protein abundance analysis of distributed data, which utilizes federated learning and additive secret sharing. In the absence of a multicenter patient-derived dataset for evaluation, we created two: one at five centers from E. coli experiments and one at three centers from human serum. Evaluations using these datasets confirm that FedProt achieves accuracy equivalent to the DEqMS method applied to pooled data, with completely negligible absolute differences no greater than 4 × 10^-12. By contrast, -log₁₀P computed by the most accurate meta-analysis methods diverged from the centralized analysis results by up to 25-26.

PubMed Disclaimer

Conflict of interest statement

Competing interests: B.K. is a co-founder and shareholder of MSAID; he holds no operational role in the company. The other authors declare no competing interests.

Figures

**Fig. 1. FedProt workflow overview.**
a, Federated workflow overview. (1) Data preparation: data owners collect and preprocess MS data, obtain protein intensity and peptide count matrices, and define design matrices before participating as clients. Personal-level information (proteomics profiles) is contained in the client’s servers and are never shared (dashed lines around clients highlight different physical locations). (2) Federated learning: clients communicate with the central server (coordinator) to collaboratively train a global model without revealing their individual datasets, but through the exchange of local model parameters. The clients protect their local parameters using additive secret sharing (blue arrows). In case the data are not numeric, such as protein group names, they are sent to the coordinator without additive secret sharing (green arrows). The coordinator returns updated global parameters to clients (black arrows). (3) Result: after all federated computations, all clients receive the results mathematically equivalent to the results of centralized analysis of pooled dataset with DEqMS formatted as a table with abundance FCs, confidence intervals and adjusted P values. b, Overview of data communication using SMPC (additive secret sharing) inside FedProt. The clients protect their local parameters using additive secret sharing. Each client data point is masked with a noise mask. The noisy data and the noise masks are splitted into n encrypted parts (n > 2). These parts are exchanged among clients (blue arrows) via a relay server, ensuring that no single party receives more than one piece of the data from each of the other clients. After decrypting the received parts, clients sum the data, and send the reencrypted sums to the coordinator, who decrypts and aggregates the sums to compute the global result. For details, see Methods and Supplementary Methods. Created with BioRender.com.

**Fig. 2. Comparative analysis of adjusted P values and ranking consistency between centralized and decentralized methods for real and simulated datasets.**
a,c, The comparison of negative log-transformed BH-method adjusted P values (−log₁₀(adjusted P value)) computed by FedProt or meta-analysis methods (y axis) with the centralized DEqMS analysis (x axis), for bacterial and human serum datasets (a) and for simulated datasets (c), for one out of the 50 runs per scenario. The thin black line is the diagonal. b,d, The dependency of the Jaccard similarity coefficient on the number of top-ranked proteins identified by the centralized DEqMS and decentralized approaches, showing the results for the bacterial and human serum datasets (b) and for the simulated datasets (d). Proteins were ranked on the basis of their decreasing negative log-transformed BH-adjusted P values and not filtered by log₂FC. The simulated data generation and the subsequent analysis were repeated 50 times, with aggregated results reported (mean values ± s.d.). Source data

**Fig. 3. Scheme of the FedProt workflow.**
Steps that involve federated computations are shown in green. The corresponding stages of DEqMS workflow are shown on the right. Median normalization from the PRONE R package was used. The validation, filtering, normalization and design mask creation steps involving three clients (C1, C2 and C3) are shown on the left. PG denotes protein groups, j are sample numbers, and A and B are target classes compared during the analysis. On step 1, the PG3 value for sample j5 is replaced with NA (only one not-missing value in the client data). On step 2, the PG5 value for sample j5 is replaced with NA (one not-missing value for this PG in the target class for this client). After that, the whole PG5 group is removed from all clients because of too few nonmissing values (here, less than f = 0.75). On the design mask creation step, client 3 for PG1 is excluded because it has no data, same for PG4 of client 2; and, the client 3 became the new reference client and also excluded from computation (PG3 is missing in client 1). See Methods and Supplementary Methods for more details. *The normalization step is optional and could be turned off by the coordinator. In case of TMT data, for filtering out decoys, contaminant and reverse protein groups are required. The normalization by median across all centers and IRS inside each center can be performed during a FedProt run. **The step is for the federated approach only. Created with BioRender.com.

See this image and copyright information in PMC

References

1. Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature537, 347–355 (2016). - PubMed
1. Altelaar, A. F. M., Munoz, J. & Heck, A. J. R. Next-generation proteomics: towards an integrative view of proteome dynamics. Nat. Rev. Genet.14, 35–48 (2013). - PubMed
1. Muntel, J. et al. Surpassing 10000 identified and quantified proteins in a single run by optimizing current LC–MS instrumentation and data analysis strategy. Mol. Omics15, 348–360 (2019). - PubMed
1. Bruderer, R. et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteom.16, 2296–2309 (2017). - PMC - PubMed
1. Fröhlich, K. et al. Data-independent acquisition: a milestone and prospect in clinical mass spectrometry-based proteomics. Mol. Cell. Proteom.23, 100800 (2024). - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Privacy-preserving multicenter differential protein abundance analysis with FedProt

Affiliations

Privacy-preserving multicenter differential protein abundance analysis with FedProt

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources