Multi-Task Learning with Summary Statistics

Parker Knight¹, Rui Duan¹

Affiliations

PMID: 39351341
PMCID: PMC11440483

Multi-Task Learning with Summary Statistics

Parker Knight et al. Adv Neural Inf Process Syst. 2023.

. 2023:36:54020-54031.

Epub 2024 May 30.

Authors

Parker Knight¹, Rui Duan¹

Affiliation

¹ Department of Biostatistics, Harvard University, Boston, MA.

PMID: 39351341
PMCID: PMC11440483

Abstract

Multi-task learning has emerged as a powerful machine learning paradigm for integrating data from multiple sources, leveraging similarities between tasks to improve overall model performance. However, the application of multi-task learning to real-world settings is hindered by data-sharing constraints, especially in healthcare settings. To address this challenge, we propose a flexible multi-task learning framework utilizing summary statistics from various sources. Additionally, we present an adaptive parameter selection approach based on a variant of Lepski's method, allowing for data-driven tuning parameter selection when only summary statistics are available. Our systematic non-asymptotic analysis characterizes the performance of the proposed methods under various regimes of the sample complexity and overlap. We demonstrate our theoretical findings and the performance of the method through extensive simulations. This work offers a more flexible tool for training related models across various domains, with practical implications in genetic risk prediction and many other fields.

PubMed Disclaimer

Figures

**Figure 1:**
Average prediction MSE per task after 100 repetitions plotted against $τ = \tilde{n} / n$ . The left hand side corresponds to the sparse estimator, and the right hand side is the low-rank estimator. The red boxes indicate the estimator that uses all of the individual-level data (IL_Cov), the blue boxes indicate the estimator that uses the true covariance matrix of the features (true_Cov), and the green boxes correspond to the estimator that uses just the proxy data (Proxy_Cov).

**Figure 2:**
Average MSE per task after 100 repetitions plotted against $\tilde{ρ}$ . The orientation and colors are the same as in Figure 1.

**Figure 3:**
Prediction MSE per task after 10 splits of the eMERGE data.

See this image and copyright information in PMC

References

1. Burnham Kenneth P. and Anderson David R.. “Multimodel Inference: Understanding AIC and BIC in Model Selection”. In: Sociological Methods & Research 33.2 (Nov. 2004), pp. 261–304. ISSN: 0049–1241, 1552–8294. DOI: 10.1177/0049124104268644. - DOI
1. Cao Han et al. “dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning”. In: Bioinformatics 38.21 (Oct. 31, 2022). Ed. by Wren Jonathan, pp. 4919–4926. ISSN: 1367–4803, 1367–4811. DOI: 10.1093/bioinformatics/btac616. - DOI - PMC - PubMed
1. Chatterjee Nilanjan, Shi Jianxin, and García-Closas Montserrat. “Developing and evaluating polygenic risk prediction models for stratified disease prevention”. In: Nature Reviews Genetics 17.7 (July 2016), pp. 392–406. ISSN: 1471–0056, 1471–0064. DOI: 10.1038/nrg.2016.27. - DOI - PMC - PubMed
1. Chen Ting-Huei et al. “A Penalized Regression Framework for Building Polygenic Risk Models Based on Summary Statistics From Genome-Wide Association Studies and Incorporating External Information”. In: Journal of the American Statistical Association 116.533 (Jan. 2, 2021), pp. 133–143. ISSN: 0162–1459, 1537–274X. DOI: 10.1080/01621459.2020.1764849. - DOI - PMC - PubMed
1. Chichignoud Michaël, Lederer Johannes, and Wainwright Martin. A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees. arXiv:1410.0247. type: article. arXiv, Nov. 8, 2016. arXiv: 1410.0247[math, stat].

Grants and funding

R01 GM148494/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-Task Learning with Summary Statistics

Affiliation

Multi-Task Learning with Summary Statistics

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources