Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 1;5(3):100945.
doi: 10.1016/j.patter.2024.100945. eCollection 2024 Mar 8.

Federated learning for multi-omics: A performance evaluation in Parkinson's disease

Affiliations

Federated learning for multi-omics: A performance evaluation in Parkinson's disease

Benjamin P Danek et al. Patterns (N Y). .

Abstract

While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open-source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.

Keywords: Parkinson’s disease diagnosis; federated learning; machine learning; omics data analysis.

PubMed Disclaimer

Conflict of interest statement

B.P.D., A.D., D.V., M.A.N., and F.F. declare the following competing financial interests, as their participation in this project was part of a competitive contract awarded to Data Tecnica LLC by the National Institutes of Health to support open science research. M.A.N. also currently serves on the scientific advisory board for Character Bio and is an advisor to Neuron23 Inc. The study’s funders had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. F.F. takes final responsibility for the decision to submit the paper for publication.

Figures

Figure 1
Figure 1
Experiment workflow diagram and data summary The harmonized and joint-called PPMI and PDBP cohorts originate from the AMP-PD initiative. The PPMI cohort is split into K folds, where one fold is left as a holdout (internal) test set and the remaining are used for model fitting. The training folds are split using an 80:20 ratio to form the training validation split. The training split is distributed among n clients using one of the split strategies to simulate the cross-silo collaborative training setting. FL methods consist of a local learner and an aggregation method. Similarly, several central algorithms are used to fit the training data. The resultant global FL models and the ML models resulting from central training are tested on the PPMI holdout fold (internal test) and the whole PDBP test set (external test).
Figure 2
Figure 2
Federated architecture and training summary The FL architecture used in the study also illustrates one round of FL training for the case of n = 3 clients. The aggregation server aggregates trained local learner parameters from clients and computing a global model. Client sites contain their own siloed dataset, each with different samples. The trained client parameters are represented by the blue, orange, and green weights; the black weights represent the aggregated global model. Client model aggregation implemented by the FL strategy is denoted by f. Once global weights are computed, a copy is sent to each client; the global model is used to initialize the local learner model weights in subsequent FL training rounds.
Figure 3
Figure 3
Federated learning models trained using publicly available and accessible framework results follow central model performance Area under the precision-recall curve (AUC-PR) comparing central algorithms against federated algorithms. We pair FL algorithms with central algorithms by the local learning algorithm applied at client sites. Federated algorithms receive the training dataset split across 20 n = 2 clients, using label stratified random sampling. Presented data are mean score and standard deviation resulting from cross-validation.
Figure 4
Figure 4
Sample dispersion among client sites negatively impacts global model performance For a fixed training dataset, the AUC-PR of federated algorithms as the quantity of client sites increases. Training data are split uniformly among each member of the federation using stratified random sampling. The PDBP and PPMI datasets are used for external and internal validation, respectively. Presented data are mean score and standard deviation resulting from cross-validation.
Figure 5
Figure 5
Data heterogeneity at client sites does not deeply influence model performance The AUC-PR for a federation of two clients for several split methods. Uniform stratified sampling represents the most homogeneous data-distribution method, while uniform random and linear random represent increasingly heterogeneous client distributions. Presented data are mean score and standard deviation resulting from cross-validation.
Figure 6
Figure 6
The mean runtime to train FL models using FedAvg and FedProx strategies The mean total runtime in seconds to train FL models. FL models are trained on the PPMI training folds for five communication rounds. Algorithms are grouped by aggregation strategy. Results presented as mean and standard deviation over K = 6 folds.

Update of

References

    1. Dadu A., Satone V., Kaur R., Hashemi S.H., Leonard H., Iwaki H., Makarious M.B., Billingsley K.J., Bandres-Ciga S., Sargent L.J., et al. Identification and prediction of Parkinson’s disease subtypes and progression using machine learning in two cohorts. NPJ Parkinsons Dis. 2022;8:172. doi: 10.1038/s41531-022-00439-z. - DOI - PMC - PubMed
    1. Prashanth R., Dutta Roy S., Mandal P.K., Ghosh S. High-Accuracy Detection of Early Parkinson’s Disease through Multimodal Features and Machine Learning. Int. J. Med. Inf. 2016;90:13–21. doi: 10.1016/j.ijmedinf.2016.03.001. - DOI - PubMed
    1. Pantaleo E., Monaco A., Amoroso N., Lombardi A., Bellantuono L., Urso D., Lo Giudice C., Picardi E., Tafuri B., Nigro S., et al. A Machine Learning Approach to Parkinson’s Disease Blood Transcriptomics. Genes. 2022;13 doi: 10.3390/genes13050727. - DOI - PMC - PubMed
    1. Lee D.A., Lee H.-J., Kim H.C., Park K.M. Application of machine learning analysis based on diffusion tensor imaging to identify REM sleep behavior disorder. Sleep Breath. 2022;26:633–640. doi: 10.1007/s11325-021-02434-9. - DOI - PubMed
    1. Green E.D., Gunter C., Biesecker L.G., Di Francesco V., Easter C.L., Feingold E.A., Felsenfeld A.L., Kaufman D.J., Ostrander E.A., Pavan W.J., et al. Strategic vision for improving human health at The Forefront of Genomics. Nature. 2020;586:683–692. doi: 10.1038/s41586-020-2817-4. - DOI - PMC - PubMed