Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 15;32(10):1493-501.
doi: 10.1093/bioinformatics/btw018. Epub 2016 Jan 14.

FINEMAP: efficient variable selection using summary data from genome-wide association studies

Affiliations

FINEMAP: efficient variable selection using summary data from genome-wide association studies

Christian Benner et al. Bioinformatics. .

Abstract

Motivation: The goal of fine-mapping in genomic regions associated with complex diseases and traits is to identify causal variants that point to molecular mechanisms behind the associations. Recent fine-mapping methods using summary data from genome-wide association studies rely on exhaustive search through all possible causal configurations, which is computationally expensive.

Results: We introduce FINEMAP, a software package to efficiently explore a set of the most important causal configurations of the region via a shotgun stochastic search algorithm. We show that FINEMAP produces accurate results in a fraction of processing time of existing approaches and is therefore a promising tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing projects.

Availability and implementation: FINEMAP v1.0 is freely available for Mac OS X and Linux at http://www.christianbenner.com

Contact: : christian.benner@helsinki.fi or matti.pirinen@helsinki.fi.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The binary indicator vector γ determines which SNPs have non-zero causal effects (formula image). The corresponding causal (linear) model for a quantitative trait assumes only few SNPs with a causal effect. The Maximum Likelihood Estimate (MLE) of the causal SNP effects λˆ can be computed by using only the SNP correlation matrix and single-SNP z-scores. However, the MLE is not ideal because it does not account for the sparsity assumption
Fig. 2.
Fig. 2.
Shotgun stochastic search rapidly identifies configurations of causal SNPs with high posterior probability. In each iteration, the neighborhood of the current causal configuration is defined by configurations that result from deleting, changing or adding a causal SNP (formula image) from the current configuration. The next iteration starts by sampling a new causal configuration from the neighborhood based on the scores normalized within the neighborhood. The unnormalized posterior probabilities remain fixed throughout the algorithm and can thus be memorized (formula image) to avoid recomputation when already-evaluated configurations appear in another neighborhood
Fig. 3.
Fig. 3.
Processing time of one locus with FINEMAP and CAVIARBF on log10 scale. Top panel: Scenario A with increasing number of SNPs allowing K = 3 or K = 5 causal SNPs. Bottom panel: Scenario B with 150 SNPs considering causal configurations with different maximum numbers of SNPs. All processing times are averaged over 500 datasets using one core of a Intel Haswell E5-2690v3 processor running at 2.6 GHz
Fig. 4.
Fig. 4.
Single-SNP inclusion probabilities of all SNPs in Scenario B with absolute difference larger than 0.01 between FINEMAP and CAVIARBF
Fig. 5.
Fig. 5.
Fine-mapping accuracy of FINEMAP and CAVIARBF on data with five causal SNPs, allowing either K = 3 or K = 5 causal SNPs. The proportion of causal SNPs included is plotted against the number of top SNPs selected on the basis of ranked single-SNP inclusion probabilities. Proportions are averaged over 500 datasets with 1500 SNPs. Case K = 5 is computationally intractable for CAVIARBF
Fig. 6.
Fig. 6.
Fine-mapping of 4q22/SNCA region associated with Parkinson’s disease. Associated SNPs rs356220 and rs7687945 are highlighted by formula image and their configuration by formula image. Dashed lines correspond respectively to a single-SNP Bayes factor of 100 and P-value of 5×108. Squared correlations are shown with respect to rs356220
Fig. 7.
Fig. 7.
Fine-mapping of 15q21/LIPC region associated with high-density lipoprotein cholesterol. Independent association signals in conditional analysis are highlighted by formula image. Dashed lines correspond respectively to a single-SNP Bayes factor of 100 and P-value of 5×108. Squared correlations are shown with respect to rs2043085

References

    1. Andrieu C. et al. (2003) An introduction to MCMC for machine learning. Mach. Learn., 50, 5–43.
    1. Bottolo L., Richardson S. (2010) Evolutionary stochastic search for Bayesian model exploration. Bayesian Anal., 3, 583–618.
    1. Bottolo L. et al. (2013) GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm. PLoS Genet., 9, e1003657. - PMC - PubMed
    1. Borodulin K. et al. (2015) Forty-year trends in cardiovascular risk factors in Finland. Eur. J. Public Health, 25, 539–546. - PubMed
    1. Carbonetto P., Stephens M. (2012) Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal., 1, 73–108.