Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Aug;19(8):491-504.
doi: 10.1038/s41576-018-0016-z.

From genome-wide associations to candidate causal variants by statistical fine-mapping

Affiliations
Review

From genome-wide associations to candidate causal variants by statistical fine-mapping

Daniel J Schaid et al. Nat Rev Genet. 2018 Aug.

Abstract

Advancing from statistical associations of complex traits with genetic markers to understanding the functional genetic variants that influence traits is often a complex process. Fine-mapping can select and prioritize genetic variants for further study, yet the multitude of analytical strategies and study designs makes it challenging to choose an optimal approach. We review the strengths and weaknesses of different fine-mapping approaches, emphasizing the main factors that affect performance. Topics include interpreting results from genome-wide association studies (GWAS), the role of linkage disequilibrium, statistical fine-mapping approaches, trans-ethnic studies, genomic annotation and data integration, and other analysis and design issues.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: None

Figures

Figure 1
Figure 1. Flow of typical process from intial GWAS to annotation of SNPs selected from fine-mapping analyses
Based on GWAS p-values summarized in a Manhattan plot, a list of SNPs that achieve genome-wide statistical significance (i.e., p-value <5×10−8) is used to determine regions of interest for fine-mapping. Each region is typlically explored according to the structure of linkage disequilibrium among single-nucleotide polymorphisms (SNPs) using Halpoview plots. Statistical assocaitions are viewed with LocusZoom plots that illustrate the patterns of association of each SNP with the lead SNP, as well as annotation of genes in the region. The regions can then be partitioned into independent sub-regions to ease computational burden, based on statistical models that evalute the simultaneous effects of multiple SNPs on a trait. Statistical fine-mapping is conducted in each region, using one of the methods illusrated in Figure 2. The SNPs selected from fine-mapping are then annotated with genomic features to prioritize follow-up functional studies. Figure is adapted from REF.
Figure 2
Figure 2. Hypothetical examples of fine-mapping strategies
All subfigures are based on LocusZoom-style illustrations of marginal single-nucleotide polymorphism (SNP) associations. The −log10(p-values) are presented on the left y-axis and variant positions are on the x-axis. The gold diamond for each locus represents the peak SNP. The results for other SNPs are coloured by descending degree of linkage disequilibrium (LD) with the peak SNP (ordered red, orange, green, and blue dots). The purple bars represent additional variant-level statistics produced by fine-mapping (i.e., Beta values for penalized regression; posterior inclusion probabilities (PIPs) for Bayesian methods ), and the corresponding scale is on the right y-axis. The light grey boxes represent the regions selected by fine-mapping. A | The heuristic approach is based on LD patterns with the peak SNP (rs12345). All SNPs that meet the orange LD category threshold have been selected. B | The penalized regression approach selects all SNPs whose effects (Beta, right axis) are not shrunk to zero. C | Bayesian fine-mapping produces SNP-level PIPs (right axis), which can be summed to form credible sets based on a specified coverage probability threshold (e.g., 95%). This example illustrates that the peak SNP does not correspond to the SNP with the highest PIP, which can occur because of the correlation structure among all SNPs in the region. D | Bayesian trans-ethnic fine-mapping. Results for the same locus are illustrated for two diverse study populations (Pop. 1 (panel Da) and Pop. 2 (panel Db)) with different local LD structures. The peak SNPs for the two analyses differ (rs12345 versus rs23456), and combining the results through meta-analysis yields a narrowed fine-mapping credible region (panel Dc). E | Joint analysis of multiple loci by integrating annotation in Bayesian fine-mapping can improve fine-mapping by borrowing annotation information across loci. Presented are three example independent loci (panels Ea–c) along with two corresponding regional annotations, indicated by bands below each locus plot. For loci 1 and 2, the peak SNPs overlap with Annotation 1, indicating enrichment. This enrichment results in the SNP with the highest PIP in locus 3 to be different from the peak SNP in locus 3.
Figure 3
Figure 3. Power of conditional analysis
This figure illustrates how conditional analyses have weaker power to detect secondary associated single-nucleotide polymorphisms (SNPs) compared to the power of an initial genome-wide association study (GWAS). Power of conditional analyses diminishes as the correlation of a primary SNP (indicated by SNP1) and a secondary SNP (indicated by SNP2) increases, and when the effect size of a secondary SNP is weaker than that for a primary SNP. For this figure, the power for an initial GWAS to detect a primary SNP1 is 90% for an effect size of R2 = 1% of explained trait variation. The effect size of a secondary SNP2 is varied from 100% to 50% of the effect size of primary SNP1.
Figure 4
Figure 4. Posterior probability for a single causal SNP when 5–40 SNPs are in a region of interest
The prior probability that a SNP is causal is assumed to be equal for all SNPs. Sample size (N) ranges from 500–20,000, and the percent of trait variation explained by the causal variant, R2, is 1%. SNPs are assumed to be equally correlated with magnitude ρ. The horizontal dotted line is for equal prior probabilities for SNPs, and the posterior probability approaches this line when the data have little information to distinguish causal from non-causal SNPs.

References

    1. Hardy J, Singleton A. Genomewide association studies and human disease. The New England journal of medicine. 2009;360:1759–68. - PMC - PubMed
    1. Consortium WTCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–78. - PMC - PubMed
    1. Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–9. - PMC - PubMed
    1. Willer CJ, et al. Discovery and refinement of loci associated with lipid levels. Nature genetics. 2013;45:1274–1283. - PMC - PubMed
    1. Nikpay M, et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nature genetics. 2015;47:1121–1130. - PMC - PubMed

Publication types