Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023:36:54020-54031.
Epub 2024 May 30.

Multi-Task Learning with Summary Statistics

Affiliations

Multi-Task Learning with Summary Statistics

Parker Knight et al. Adv Neural Inf Process Syst. 2023.

Abstract

Multi-task learning has emerged as a powerful machine learning paradigm for integrating data from multiple sources, leveraging similarities between tasks to improve overall model performance. However, the application of multi-task learning to real-world settings is hindered by data-sharing constraints, especially in healthcare settings. To address this challenge, we propose a flexible multi-task learning framework utilizing summary statistics from various sources. Additionally, we present an adaptive parameter selection approach based on a variant of Lepski's method, allowing for data-driven tuning parameter selection when only summary statistics are available. Our systematic non-asymptotic analysis characterizes the performance of the proposed methods under various regimes of the sample complexity and overlap. We demonstrate our theoretical findings and the performance of the method through extensive simulations. This work offers a more flexible tool for training related models across various domains, with practical implications in genetic risk prediction and many other fields.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Average prediction MSE per task after 100 repetitions plotted against τ=n˜/n. The left hand side corresponds to the sparse estimator, and the right hand side is the low-rank estimator. The red boxes indicate the estimator that uses all of the individual-level data (IL_Cov), the blue boxes indicate the estimator that uses the true covariance matrix of the features (true_Cov), and the green boxes correspond to the estimator that uses just the proxy data (Proxy_Cov).
Figure 2:
Figure 2:
Average MSE per task after 100 repetitions plotted against ρ˜. The orientation and colors are the same as in Figure 1.
Figure 3:
Figure 3:
Prediction MSE per task after 10 splits of the eMERGE data.

References

    1. Burnham Kenneth P. and Anderson David R.. “Multimodel Inference: Understanding AIC and BIC in Model Selection”. In: Sociological Methods & Research 33.2 (Nov. 2004), pp. 261–304. ISSN: 0049–1241, 1552–8294. DOI: 10.1177/0049124104268644. - DOI
    1. Cao Han et al. “dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning”. In: Bioinformatics 38.21 (Oct. 31, 2022). Ed. by Wren Jonathan, pp. 4919–4926. ISSN: 1367–4803, 1367–4811. DOI: 10.1093/bioinformatics/btac616. - DOI - PMC - PubMed
    1. Chatterjee Nilanjan, Shi Jianxin, and García-Closas Montserrat. “Developing and evaluating polygenic risk prediction models for stratified disease prevention”. In: Nature Reviews Genetics 17.7 (July 2016), pp. 392–406. ISSN: 1471–0056, 1471–0064. DOI: 10.1038/nrg.2016.27. - DOI - PMC - PubMed
    1. Chen Ting-Huei et al. “A Penalized Regression Framework for Building Polygenic Risk Models Based on Summary Statistics From Genome-Wide Association Studies and Incorporating External Information”. In: Journal of the American Statistical Association 116.533 (Jan. 2, 2021), pp. 133–143. ISSN: 0162–1459, 1537–274X. DOI: 10.1080/01621459.2020.1764849. - DOI - PMC - PubMed
    1. Chichignoud Michaël, Lederer Johannes, and Wainwright Martin. A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees. arXiv:1410.0247. type: article. arXiv, Nov. 8, 2016. arXiv: 1410.0247[math, stat].

LinkOut - more resources