Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec;17(4):2970-2992.
doi: 10.1214/23-AOAS1747. Epub 2023 Oct 30.

TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH

Affiliations

TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH

By Sai Li et al. Ann Appl Stat. 2023 Dec.

Abstract

The limited representation of minorities and disadvantaged populations in large-scale clinical and genomics research poses a significant barrier to translating precision medicine research into practice. Prediction models are likely to underperform in underrepresented populations due to heterogeneity across populations, thereby exacerbating known health disparities. To address this issue, we propose FETA, a two-way data integration method that leverages a federated transfer learning approach to integrate heterogeneous data from diverse populations and multiple healthcare institutions, with a focus on a target population of interest having limited sample sizes. We show that FETA achieves performance comparable to the pooled analysis, where individual-level data is shared across institutions, with only a small number of communications across participating sites. Our theoretical analysis and simulation study demonstrate how FETA's estimation accuracy is influenced by communication budgets, privacy restrictions, and heterogeneity across populations. We apply FETA to multisite data from the electronic Medical Records and Genomics (eMERGE) Network to construct genetic risk prediction models for extreme obesity. Compared to models trained using target data only, source data only, and all data without accounting for population-level differences, FETA shows superior predictive performance. FETA has the potential to improve estimation and prediction accuracy in underrepresented populations and reduce the gap in model performance across populations.

Keywords: Federated learning; health equity; precision medicine; risk prediction; transfer learning.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
A schematic illustration of the federated transfer learning framework and the problem setting.
Fig. 2.
Fig. 2.
Comparison of AUC over 200 replications under simulation settings 1 and 2.
Fig. 3.
Fig. 3.
Comparison of AUC over 200 replications under Case 1 to Case 4, where we set K=3 and M=5. The number of helpful source populations decreases from Case 1 to Case 4.
Fig. 4.
Fig. 4.
Comparisons of correlations and minor allele frequencies (MAFs) calculated from the source and the target populations. Among the 2047 selected SNPs, we show the SNPs on chromosome 1 for illustration purpose. The left panel shows the pairwise correlations between SNPs calculated from the source (the lower triangle) and the target (the upper triangle). The right panel compares the MAFs.
Fig. 5.
Fig. 5.
Comparisons of prediction performance across three testing datasets. Each colored bar denoted the average AUC over 20 replications, and the error bar indicates the highest and lowest performance.

References

    1. Ashley EA (2016). Towards precision medicine. Nat. Rev. Genet 17 507–522. - PubMed
    1. Bastani H (2020). Predicting with proxies: Transfer learning in high dimension. Manage. Sci 67 2657–3320.
    1. Bickel PJ, Ritov Y and Tsybakov AB (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist 37 1705–1732. MR2533469 10.1214/08-AOS620 - DOI
    1. Cai T, Liu M and Xia Y (2022). Individual data protected integrative regression analysis of high-dimensional heterogeneous data. J. Amer. Statist. Assoc 117 2105–2119. MR4528492 10.1080/01621459.2021.1904958 - DOI - PMC - PubMed
    1. Cai TT and Wei H (2021). Transfer learning for nonparametric classification: Minimax rate and adaptive classifier. Ann. Statist 49 100–128. MR4206671 10.1214/20-AOS1949 - DOI

LinkOut - more resources