. 2023 Dec 28;19(12):e1011104.

doi: 10.1371/journal.pgen.1011104. eCollection 2023 Dec.

SparsePro: An efficient fine-mapping method integrating summary statistics and functional annotations

Wenmin Zhang¹, Hamed Najafabadi^{1

2

3}, Yue Li^{1

4}

Affiliations

¹ Quantitative Life Sciences, McGill University, Montreal, Quebec, Canada.
² Department of Human Genetics, McGill University, Montreal, Quebec, Canada.
³ Dahdaleh Institute of Genomic Medicine, Montreal, Quebec, Canada.
⁴ School of Computer Science, McGill University, Montreal, Quebec, Canada.

PMID: 38153934
PMCID: PMC10781022
DOI: 10.1371/journal.pgen.1011104

SparsePro: An efficient fine-mapping method integrating summary statistics and functional annotations

Wenmin Zhang et al. PLoS Genet. 2023.

. 2023 Dec 28;19(12):e1011104.

doi: 10.1371/journal.pgen.1011104. eCollection 2023 Dec.

Authors

Wenmin Zhang¹, Hamed Najafabadi^{1

2

3}, Yue Li^{1

4}

Affiliations

¹ Quantitative Life Sciences, McGill University, Montreal, Quebec, Canada.
² Department of Human Genetics, McGill University, Montreal, Quebec, Canada.
³ Dahdaleh Institute of Genomic Medicine, Montreal, Quebec, Canada.
⁴ School of Computer Science, McGill University, Montreal, Quebec, Canada.

PMID: 38153934
PMCID: PMC10781022
DOI: 10.1371/journal.pgen.1011104

Abstract

Identifying causal variants from genome-wide association studies (GWAS) is challenging due to widespread linkage disequilibrium (LD) and the possible existence of multiple causal variants in the same genomic locus. Functional annotations of the genome may help to prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. Classical fine-mapping methods conducting an exhaustive search of variant-level causal configurations have a high computational cost, especially when the underlying genetic architecture and LD patterns are complex. SuSiE provided an iterative Bayesian stepwise selection algorithm for efficient fine-mapping. In this work, we build connections between SuSiE and a paired mean field variational inference algorithm through the implementation of a sparse projection, and propose effective strategies for estimating hyperparameters and summarizing posterior probabilities. Moreover, we incorporate functional annotations into fine-mapping by jointly estimating enrichment weights to derive functionally-informed priors. We evaluate the performance of SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved improved power for fine-mapping with reduced computation time. We demonstrate the utility of SparsePro through fine-mapping of five functional biomarkers of clinically relevant phenotypes. In summary, we have developed an efficient fine-mapping method for integrating summary statistics and functional annotations. Our method can have wide utility in understanding the genetics of complex traits and increasing the yield of functional follow-up studies of GWAS. SparsePro software is available on GitHub at https://github.com/zhwm/SparsePro.

Copyright: © 2023 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. SparsePro for integrating summary statistics and functional annotations.**
The data generative process in SparsePro is depicted in this graphical model. Green shaded nodes represent observed variables: functional annotation information A_g for the g^th variant, genotype X_i and trait y_i for the i^th individual. The orange unshaded nodes represent latent variables. Specifically, ${\tilde{π}}_{g}$ is the prior inclusion probability for the g^th variant derived from functional annotation information; s_k is a sparse indicator specifying the variant representation of the k^th effect group and β_k represents the effect size of the k^th effect group. As a result, posterior summary can be obtained from posterior distribution of s_k. Here, we assume individual-level data are available and adaption to GWAS summary statistics is detailed in the S1 Text.

**Fig 2. Locus simulation in the setting of K = 5 (number of causal variants) and W = 2 (enrichment intensity).**
(A) Comparison of posterior inclusion probabilities (PIP) obtained using different methods. Each dot represents a variant. True causal variants are colored red and non-causal variants are colored black. (B) Precision-recall curves. The area under the precision-recall curve (AUPRC) for each method is indicated. (C) Calibration curves. Variants are grouped into five bins according to their PIP values. Each dot represents one bin. The actual precision (y-axis) is plotted against the expected precision (x-axis) calculated by mean PIP values across all variants in the bin.

**Fig 3. Summary of locus simulations.**
(A) Coverage, power and size of 95% credible sets. (B) Area under the precision-recall curve (AUPRC). (C) Computation time in seconds.

**Fig 4. Genome-wide simulations.**
(A) Comparison of posterior inclusion probabilities (PIP) obtained using different methods in the simulation setting of W = 2. True causal variants are colored red and non-causal variants are colored black. (B) The logarithmic relative ratio (logRR) between the largest and smallest prior inclusion probabilities. (C) Coverage, power and size for 95% credible sets.

**Fig 5. Biological relevance of fine-mapping results for functional biomarkers of clinically relevant phenotypes.**
(A) Enrichment fold in tissue-specific annotations. Each row denotes a tissue-specific annotation derived from histone marks (Methods) and each column denotes a functional biomarker. Error bars represent 95% confidence intervals for enrichment estimates. (B) Proportion of top variants from 95% credible sets mapped to tissue-specific annotations. Rows denote relevant tissue-specific annotations and columns denote functional biomarkers.

**Fig 6. Genes harboring causal sets for five functional biomarkers of clinically relevant phenotypes.**
(A) Genome-wide distribution of genes harboring causal sets for at least two functional biomarkers. (B) *GCKR* locus with fine-mapped variant rs1260326. This variants was deemed causal for eGFR, glucose, gamma-GT and pulse rate. P-values from GWAS and posterior inclusion probabilities inferred from SparsePro+ are illustrated. Variants within a ±500kb window are colored by their linkage disequilibrium r² with rs1260326.

See this image and copyright information in PMC

Cited by

Fast and accurate Bayesian polygenic risk modeling with variational inference.
Zabad S, Gravel S, Li Y. Zabad S, et al. Am J Hum Genet. 2023 May 4;110(5):741-761. doi: 10.1016/j.ajhg.2023.03.009. Epub 2023 Apr 7. Am J Hum Genet. 2023. PMID: 37030289 Free PMC article.
MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies.
Gao B, Zhou X. Gao B, et al. Nat Genet. 2024 Jan;56(1):170-179. doi: 10.1038/s41588-023-01604-7. Epub 2024 Jan 2. Nat Genet. 2024. PMID: 38168930 Free PMC article.
Accounting for genetic effect heterogeneity in fine-mapping and improving power to detect gene-environment interactions with SharePro.
Zhang W, Sladek R, Li Y, Najafabadi H, Dupuis J. Zhang W, et al. Nat Commun. 2024 Oct 30;15(1):9374. doi: 10.1038/s41467-024-53818-w. Nat Commun. 2024. PMID: 39478020 Free PMC article.
Integration of Expression QTLs with fine mapping via SuSiE.
Zhang X, Jiang W, Zhao H. Zhang X, et al. medRxiv [Preprint]. 2023 Oct 6:2023.10.03.23294486. doi: 10.1101/2023.10.03.23294486. medRxiv. 2023. Update in: PLoS Genet. 2024 Jan 25;20(1):e1010929. doi: 10.1371/journal.pgen.1010929. PMID: 37873337 Free PMC article. Updated. Preprint.
Integration of expression QTLs with fine mapping via SuSiE.
Zhang X, Jiang W, Zhao H. Zhang X, et al. PLoS Genet. 2024 Jan 25;20(1):e1010929. doi: 10.1371/journal.pgen.1010929. eCollection 2024 Jan. PLoS Genet. 2024. PMID: 38271473 Free PMC article.

See all "Cited by" articles

References

1. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al.. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. doi: 10.1038/s41586-018-0579-z - DOI - PMC - PubMed
1. Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nature Genetics. 2018;50(11):1593–1599. doi: 10.1038/s41588-018-0248-z - DOI - PMC - PubMed
1. Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-model association for biobank-scale datasets. Nature Genetics. 2018;50(7):906–908. doi: 10.1038/s41588-018-0144-6 - DOI - PMC - PubMed
1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al.. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101(1):5–22. doi: 10.1016/j.ajhg.2017.06.005 - DOI - PMC - PubMed
1. Schaid DJ, Chen W, Larson NB. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nature Reviews Genetics. 2018;19(8):491–504. doi: 10.1038/s41576-018-0016-z - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SparsePro: An efficient fine-mapping method integrating summary statistics and functional annotations

Affiliations

SparsePro: An efficient fine-mapping method integrating summary statistics and functional annotations

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Research Materials