Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(8):e1003609.
doi: 10.1371/journal.pgen.1003609. Epub 2013 Aug 8.

Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification

Affiliations

Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification

Laura L Faye et al. PLoS Genet. 2013.

Abstract

Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Tagging effect decreases localization success rates with or without the selection effect.
The expected values of the association test statistics at a tag SNP (red) and the causal SNP (black), shading from 25th–75th percentiles (A, C), and the localization success rates (B, D) for association studies (1000 cases and 1000 controls) of one causal SNP (MAF = 0.12; OR = 1.25; perfect genotyping accuracy) and one tag SNP (MAF = 0.12; in varying degree of correlation with the causal SNP, r = 0.2 to 1; perfect genotyping accuracy) with no selection for significance at the tag SNP (A, B) or selection at the tag SNP requiring the test statistic TG to be significant with p-value<0.05 (C, D).
Figure 2
Figure 2. Low genotyping accuracy further reduces localization success rates with or without the selection effect.
Localization success rates for association studies (1000 cases and 1000 controls) of one causal SNP (MAF = 0.12; OR = 1.25; imperfect genotyping accuracy due to sequencing or imputation errors resulting in correlation between the actual and estimated genotypes ρC = 0.80 (blue dash-dotted) to 1 (black solid) and one tag SNP (MAF = 0.12; in varying degree of correlation with the causal SNP, rCG = 0.2 to 1 (X-axis); perfect genotyping accuracy with ρ G = 1) with no selection for significance at the tag SNP (A) or selection at the tag SNP requiring the test statistic TG to be significant with p-value<0.05 (B).
Figure 3
Figure 3. Well-tagged causal SNPs sequenced with low accuracy are unlikely to be correctly identified even as sample size increases.
Localization success rates for association studies (sample size from 50∶50 cases∶controls to 5000∶5000 cases∶controls, X-axis) of one causal SNP (MAF = 0.12; OR = 1.25; imperfect genotyping accuracy due to sequencing or imputation errors resulting in correlation between the actual and estimated genotypes ρC = 0.95) and one tag SNP (MAF = 0.12; in high correlation with the causal SNP, rCG = 0.8 (purple solid) to 0.98 (red dashed); 100% genotyping accuracy with ρ G  = 1) with no selection for significance at the tag SNP.
Figure 4
Figure 4. Naïve test statistics and re-ranking statistics for regions surrounding rs78246868 in the 8q24.21 region for association with prostate cancer risk.
Naïve test statistics (A), and re-ranking statistics adjusting for genotyping accuracy (B) for SNPs in LD (r2>0.2) with rs78246868. Circles highlight SNPs whose rank changed considerably after re-ranking. Color indicates pair-wise correlation with the most significant SNP in the region selected based on the naïve ranking (purple diamond). Other shapes indicate genotyping accuracy over all 7 studies as measured by ρmeta. rs78246868 is no longer the most significant SNP in the region after re-ranking.
Figure 5
Figure 5. Naïve test statistics and re-ranking statistics for regions surrounding rs8071558 in the 17q24.3 region for association with prostate cancer risk.
Naïve test statistics (A), and re-ranking statistics adjusting for genotyping accuracy (B) for SNPs in LD (r2>0.2) with rs8071558. Circles highlight SNPs whose rank changed considerably after re-ranking. Color indicates pair-wise correlation with the most significant SNP in the region selected based on the naïve ranking (purple diamond). Other shape indicates genotyping accuracy over all 7 studies as measured by ρmeta, rs8071558 is no longer the most significant SNP in the region after re-ranking.

References

    1. Cooper GM, Shendure J (2011) Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12: 628–40. - PubMed
    1. Georges M (2011) The long and winding road from correlation to causation. Nat Genet 43 3: 180–1. - PubMed
    1. Ioannidis JP, Thomas G, Daly MJ (2009) Validating, augmenting and refining genome-wide association signals. Nat Rev Genet 10: 318–29. - PMC - PubMed
    1. Zaitlen N, Paşaniuc B, Gur T, Ziv E, Halperin E (2010) Leveraging genetic variability across populations for the identification of causal variants. Am J Hum Genet 86: 23–33. - PMC - PubMed
    1. Udler MS, Tyrer J, Easton DF (2010) Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet Epidemiol 34 5: 463–8. - PubMed

Publication types