Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification
- PMID: 23950724
- PMCID: PMC3738448
- DOI: 10.1371/journal.pgen.1003609
Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification
Abstract
Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures





References
-
- Cooper GM, Shendure J (2011) Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12: 628–40. - PubMed
-
- Georges M (2011) The long and winding road from correlation to causation. Nat Genet 43 3: 180–1. - PubMed
-
- Udler MS, Tyrer J, Easton DF (2010) Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet Epidemiol 34 5: 463–8. - PubMed
Publication types
MeSH terms
Grants and funding
- 84287-2/CAPMC/ CIHR/Canada
- MOP-84287/CAPMC/ CIHR/Canada
- 84287-1/CAPMC/ CIHR/Canada
- U01 CA098216/CA/NCI NIH HHS/United States
- T32 GM074897/GM/NIGMS NIH HHS/United States
- U01 CA098233/CA/NCI NIH HHS/United States
- MDR-88001/CAPMC/ CIHR/Canada
- WT_/Wellcome Trust/United Kingdom
- T32-GM074897/GM/NIGMS NIH HHS/United States
- U01 CA098710/CA/NCI NIH HHS/United States
- U01-CA98233/CA/NCI NIH HHS/United States
- U01-CA98710/CA/NCI NIH HHS/United States
- U01-CA98216/CA/NCI NIH HHS/United States
- 076113/WT_/Wellcome Trust/United Kingdom
- GET-101831/CAPMC/ CIHR/Canada
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials