Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 28;19(12):e1011104.
doi: 10.1371/journal.pgen.1011104. eCollection 2023 Dec.

SparsePro: An efficient fine-mapping method integrating summary statistics and functional annotations

Affiliations

SparsePro: An efficient fine-mapping method integrating summary statistics and functional annotations

Wenmin Zhang et al. PLoS Genet. .

Abstract

Identifying causal variants from genome-wide association studies (GWAS) is challenging due to widespread linkage disequilibrium (LD) and the possible existence of multiple causal variants in the same genomic locus. Functional annotations of the genome may help to prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. Classical fine-mapping methods conducting an exhaustive search of variant-level causal configurations have a high computational cost, especially when the underlying genetic architecture and LD patterns are complex. SuSiE provided an iterative Bayesian stepwise selection algorithm for efficient fine-mapping. In this work, we build connections between SuSiE and a paired mean field variational inference algorithm through the implementation of a sparse projection, and propose effective strategies for estimating hyperparameters and summarizing posterior probabilities. Moreover, we incorporate functional annotations into fine-mapping by jointly estimating enrichment weights to derive functionally-informed priors. We evaluate the performance of SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved improved power for fine-mapping with reduced computation time. We demonstrate the utility of SparsePro through fine-mapping of five functional biomarkers of clinically relevant phenotypes. In summary, we have developed an efficient fine-mapping method for integrating summary statistics and functional annotations. Our method can have wide utility in understanding the genetics of complex traits and increasing the yield of functional follow-up studies of GWAS. SparsePro software is available on GitHub at https://github.com/zhwm/SparsePro.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. SparsePro for integrating summary statistics and functional annotations.
The data generative process in SparsePro is depicted in this graphical model. Green shaded nodes represent observed variables: functional annotation information Ag for the gth variant, genotype Xi and trait yi for the ith individual. The orange unshaded nodes represent latent variables. Specifically, π˜g is the prior inclusion probability for the gth variant derived from functional annotation information; sk is a sparse indicator specifying the variant representation of the kth effect group and βk represents the effect size of the kth effect group. As a result, posterior summary can be obtained from posterior distribution of sk. Here, we assume individual-level data are available and adaption to GWAS summary statistics is detailed in the S1 Text.
Fig 2
Fig 2. Locus simulation in the setting of K = 5 (number of causal variants) and W = 2 (enrichment intensity).
(A) Comparison of posterior inclusion probabilities (PIP) obtained using different methods. Each dot represents a variant. True causal variants are colored red and non-causal variants are colored black. (B) Precision-recall curves. The area under the precision-recall curve (AUPRC) for each method is indicated. (C) Calibration curves. Variants are grouped into five bins according to their PIP values. Each dot represents one bin. The actual precision (y-axis) is plotted against the expected precision (x-axis) calculated by mean PIP values across all variants in the bin.
Fig 3
Fig 3. Summary of locus simulations.
(A) Coverage, power and size of 95% credible sets. (B) Area under the precision-recall curve (AUPRC). (C) Computation time in seconds.
Fig 4
Fig 4. Genome-wide simulations.
(A) Comparison of posterior inclusion probabilities (PIP) obtained using different methods in the simulation setting of W = 2. True causal variants are colored red and non-causal variants are colored black. (B) The logarithmic relative ratio (logRR) between the largest and smallest prior inclusion probabilities. (C) Coverage, power and size for 95% credible sets.
Fig 5
Fig 5. Biological relevance of fine-mapping results for functional biomarkers of clinically relevant phenotypes.
(A) Enrichment fold in tissue-specific annotations. Each row denotes a tissue-specific annotation derived from histone marks (Methods) and each column denotes a functional biomarker. Error bars represent 95% confidence intervals for enrichment estimates. (B) Proportion of top variants from 95% credible sets mapped to tissue-specific annotations. Rows denote relevant tissue-specific annotations and columns denote functional biomarkers.
Fig 6
Fig 6. Genes harboring causal sets for five functional biomarkers of clinically relevant phenotypes.
(A) Genome-wide distribution of genes harboring causal sets for at least two functional biomarkers. (B) GCKR locus with fine-mapped variant rs1260326. This variants was deemed causal for eGFR, glucose, gamma-GT and pulse rate. P-values from GWAS and posterior inclusion probabilities inferred from SparsePro+ are illustrated. Variants within a ±500kb window are colored by their linkage disequilibrium r2 with rs1260326.

Similar articles

Cited by

References

    1. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al.. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. doi: 10.1038/s41586-018-0579-z - DOI - PMC - PubMed
    1. Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nature Genetics. 2018;50(11):1593–1599. doi: 10.1038/s41588-018-0248-z - DOI - PMC - PubMed
    1. Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-model association for biobank-scale datasets. Nature Genetics. 2018;50(7):906–908. doi: 10.1038/s41588-018-0144-6 - DOI - PMC - PubMed
    1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al.. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101(1):5–22. doi: 10.1016/j.ajhg.2017.06.005 - DOI - PMC - PubMed
    1. Schaid DJ, Chen W, Larson NB. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nature Reviews Genetics. 2018;19(8):491–504. doi: 10.1038/s41576-018-0016-z - DOI - PMC - PubMed