Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov;37(7):666-74.
doi: 10.1002/gepi.21747. Epub 2013 Jul 8.

The value of statistical or bioinformatics annotation for rare variant association with quantitative trait

Affiliations

The value of statistical or bioinformatics annotation for rare variant association with quantitative trait

Andrea E Byrnes et al. Genet Epidemiol. 2013 Nov.

Abstract

In the past few years, a plethora of methods for rare variant association with phenotype have been proposed. These methods aggregate information from multiple rare variants across genomic region(s), but there is little consensus as to which method is most effective. The weighting scheme adopted when aggregating information across variants is one of the primary determinants of effectiveness. Here we present a systematic evaluation of multiple weighting schemes through a series of simulations intended to mimic large sequencing studies of a quantitative trait. We evaluate existing phenotype-independent and phenotype-dependent methods, as well as weights estimated by penalized regression approaches including Lasso, Elastic Net, and SCAD. We find that the difference in power between phenotype-dependent schemes is negligible when high-quality functional annotations are available. When functional annotations are unavailable or incomplete, all methods suffer from power loss; however, the variable selection methods outperform the others at the cost of increased computational time. Therefore, in the absence of good annotation, we recommend variable selection methods (which can be viewed as "statistical annotation") on top of regions implicated by a phenotype-independent weighting scheme. Further, once a region is implicated, variable selection can help to identify potential causal single nucleotide polymorphisms for biological validation. These findings are supported by an analysis of a high coverage targeted sequencing study of 1,898 individuals.

Keywords: association; rare variants; variable selection; variant annotation; weighting.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Power Comparison in the Absence of a Bioinformatics Tool
Figure 1 shows the power (Y-axis) of the different methods across a wide spectrum of m (the number of true causal variants) and r (the proportion of variants that contribute to our quantitative trait in a positive direction) in the absence of a bioinformatics tool. In Figure 1a, we fix m at 10 and show power comparisons across the entire spectrum of r (X-axis). Figure 1b shows how power changes as a function of m (X-axis) with r fixed at 0.8. Here we use the logit link function.
Figure 2
Figure 2. Power Comparison in the Presence of the Good Bioinformatics Tool
Figure 2 shows the power (Y-axis) of the different methods across a wide spectrum of m (the number of true causal variants) and r (the proportion of variants that contribute to our quantitative trait in a positive direction) in the presence of the good bioinformatics tool described in the Method section. Like in Figure 1a, we fix m at 10 and show power comparisons across the entire spectrum of r (X-axis) in Figure 2a. Similarly, Figure 2b how power of the methods changes as a function of m (X-axis) with r fixed at 0.8. Again the logit link function is used.
Figure 3
Figure 3. How Far Down the Ranked List are the Truly Causal Variants when All Variants are Included?
Figure 3a shows the number of variants that must be considered (Y-axis) in order to catch the top 10%, 20% … 100% of truly causal variants (X-axis) in simulation when all variants are considered. We assume that the variants are ranked in order of significance. These plots aggregate true and estimated weights from all 10,000 replicates of the experiment and once again, we fix r at 0.8, m at 10 and use the logit link function. Figure 3b. takes LD buddies (variants with r2 > 0.8 with causal variant) into consideration. Figure 3c. restricts the results from 3a. to functional variants only using a good bioinformatics tool. Figure 3d. is restricted to functional variants only and takes LD buddies into account.

References

    1. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65. - PMC - PubMed
    1. Auer PL, Johnsen JM, Johnson AD, Logsdon BA, Lange LA, Nalls MA, Zhang G, Franceschini N, Fox K, Lange EM, Rich SS, O'Donnell CJ, Jackson RD, Wallace RB, Chen Z, Graubert TA, Wilson JG, Tang H, Lettre G, Reiner AP, Ganesh SK, Li Y. Imputation of Exome Sequence Variants into Population- Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project. Am J Hum Genet. 2012;91(5):794–808. - PMC - PubMed
    1. Bacanu SA, Nelson MR, Whittaker JC. Comparison of methods and sampling designs to test for association between rare variants and quantitative traits. Genetic Epidemiology 2011 - PubMed
    1. Cheung YH, Wang G, Leal SM, Wang S. A fast and noise-resilient approach to detect rare-variant associations with deep sequencing data for complex disorders. Genet Epidemiol. 2012;36(7):675–85. - PMC - PubMed
    1. Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305(5685):869–872. - PubMed

Publication types

LinkOut - more resources