. 2025 Jan 7;21(1):e1011519.

doi: 10.1371/journal.pgen.1011519. eCollection 2025 Jan.

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes

Deborah Kunkel¹, Peter Sørensen², Vijay Shankar³, Fabio Morgante^{3

4}

Affiliations

¹ School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, United States of America.
² Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
³ Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America.
⁴ Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America.

PMID: 39775068
PMCID: PMC11741642
DOI: 10.1371/journal.pgen.1011519

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes

Deborah Kunkel et al. PLoS Genet. 2025.

. 2025 Jan 7;21(1):e1011519.

doi: 10.1371/journal.pgen.1011519. eCollection 2025 Jan.

Authors

Deborah Kunkel¹, Peter Sørensen², Vijay Shankar³, Fabio Morgante^{3

4}

Affiliations

¹ School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, United States of America.
² Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
³ Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America.
⁴ Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America.

PMID: 39775068
PMCID: PMC11741642
DOI: 10.1371/journal.pgen.1011519

Abstract

Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy, was introduced. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in the UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data set has smaller sample size.

Copyright: © 2025 Kunkel et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Prediction accuracy in simulations with different patterns of effect sharing across phenotypes.**
Each panel summarizes the accuracy of the test set predictions in 20 simulations. The thick, black line in each box gives the median R². The dotted and dashed lines give the maximum accuracy achievable, *i.e.*, the simulated $h_{g}^{2}$ .

**Fig 2. Prediction accuracy in simulations with different genetic architecture.**
Each panel summarizes the accuracy of the test set predictions in 20 simulations. The thick, black line in each box gives the median R². The dotted lines give the maximum accuracy achievable, *i.e.*, the simulated $h_{g}^{2}$ .

**Fig 3. Prediction accuracy for the 16 blood cell traits in the full UK Biobank data.**
The thick, black line in each box gives the median R².

**Fig 4. Relationship between improvement in prediction accuracy and genomic heritability in the full UK Biobank data.**
Phenotypes are plotted along the x-axis by their genomic heritability ( $h_{g}^{2}$ ) and along the y-axis by the change in R² relative to the *LDpred2-auto* (Panel A) and *SBayesR* (Panel B); that is, (R²(*mr.mash-rss*)—R²(other method))/R²(other method). The blue line represents the linear regression fit with 95% confidence bands.

**Fig 5. Prediction accuracy for the 16 blood cell traits in the sampled UK Biobank data.**
The thick, black line in each box gives the median R².

See this image and copyright information in PMC

Update of

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes.
Kunkel D, Sørensen P, Shankar V, Morgante F. Kunkel D, et al. bioRxiv [Preprint]. 2024 May 10:2024.05.06.592745. doi: 10.1101/2024.05.06.592745. bioRxiv. 2024. Update in: PLoS Genet. 2025 Jan 07;21(1):e1011519. doi: 10.1371/journal.pgen.1011519. PMID: 38766136 Free PMC article. Updated. Preprint.

References

1. Hickey JM, Chiurugwi T, Mackay I, Powell W. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nature genetics. 2017;49(9):1297–1303. doi: 10.1038/ng.3920 - DOI - PubMed
1. Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome medicine. 2020;12(1):1–11. doi: 10.1186/s13073-020-00742-5 - DOI - PMC - PubMed
1. Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, et al. Opportunities and challenges for transcriptome-wide association studies. Nature genetics. 2019;51(4):592–599. doi: 10.1038/s41588-019-0385-z - DOI - PMC - PubMed
1. Walsh B, Lynch M. Evolution and selection of quantitative traits. Oxford University Press; 2018.
1. Cao C, Ding B, Li Q, Kwok D, Wu J, Long Q. Power analysis of transcriptome-wide association study: Implications for practical protocol choice. PLoS genetics. 2021;17(2):e1009405. doi: 10.1371/journal.pgen.1009405 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R35 GM146868/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes

Affiliations

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials