The value of statistical or bioinformatics annotation for rare variant association with quantitative trait

Andrea E Byrnes¹, Michael C Wu, Fred A Wright, Mingyao Li, Yun Li

Affiliations

PMID: 23836599
PMCID: PMC4083762
DOI: 10.1002/gepi.21747

The value of statistical or bioinformatics annotation for rare variant association with quantitative trait

Andrea E Byrnes et al. Genet Epidemiol. 2013 Nov.

. 2013 Nov;37(7):666-74.

doi: 10.1002/gepi.21747. Epub 2013 Jul 8.

Authors

Andrea E Byrnes¹, Michael C Wu, Fred A Wright, Mingyao Li, Yun Li

Affiliation

¹ Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina.

PMID: 23836599
PMCID: PMC4083762
DOI: 10.1002/gepi.21747

Abstract

In the past few years, a plethora of methods for rare variant association with phenotype have been proposed. These methods aggregate information from multiple rare variants across genomic region(s), but there is little consensus as to which method is most effective. The weighting scheme adopted when aggregating information across variants is one of the primary determinants of effectiveness. Here we present a systematic evaluation of multiple weighting schemes through a series of simulations intended to mimic large sequencing studies of a quantitative trait. We evaluate existing phenotype-independent and phenotype-dependent methods, as well as weights estimated by penalized regression approaches including Lasso, Elastic Net, and SCAD. We find that the difference in power between phenotype-dependent schemes is negligible when high-quality functional annotations are available. When functional annotations are unavailable or incomplete, all methods suffer from power loss; however, the variable selection methods outperform the others at the cost of increased computational time. Therefore, in the absence of good annotation, we recommend variable selection methods (which can be viewed as "statistical annotation") on top of regions implicated by a phenotype-independent weighting scheme. Further, once a region is implicated, variable selection can help to identify potential causal single nucleotide polymorphisms for biological validation. These findings are supported by an analysis of a high coverage targeted sequencing study of 1,898 individuals.

Keywords: association; rare variants; variable selection; variant annotation; weighting.

PubMed Disclaimer

Figures

**Figure 1. Power Comparison in the Absence of a Bioinformatics Tool**
Figure 1 shows the power (Y-axis) of the different methods across a wide spectrum of m (the number of true causal variants) and r (the proportion of variants that contribute to our quantitative trait in a positive direction) in the absence of a bioinformatics tool. In Figure 1a, we fix m at 10 and show power comparisons across the entire spectrum of r (X-axis). Figure 1b shows how power changes as a function of m (X-axis) with r fixed at 0.8. Here we use the logit link function.

**Figure 2. Power Comparison in the Presence of the Good Bioinformatics Tool**
Figure 2 shows the power (Y-axis) of the different methods across a wide spectrum of m (the number of true causal variants) and r (the proportion of variants that contribute to our quantitative trait in a positive direction) in the presence of the good bioinformatics tool described in the Method section. Like in Figure 1a, we fix m at 10 and show power comparisons across the entire spectrum of r (X-axis) in Figure 2a. Similarly, Figure 2b how power of the methods changes as a function of m (X-axis) with r fixed at 0.8. Again the logit link function is used.

**Figure 3. How Far Down the Ranked List are the Truly Causal Variants when All Variants are Included?**
Figure 3a shows the number of variants that must be considered (Y-axis) in order to catch the top 10%, 20% … 100% of truly causal variants (X-axis) in simulation when all variants are considered. We assume that the variants are ranked in order of significance. These plots aggregate true and estimated weights from all 10,000 replicates of the experiment and once again, we fix r at 0.8, m at 10 and use the logit link function. Figure 3b. takes LD buddies (variants with r² > 0.8 with causal variant) into consideration. Figure 3c. restricts the results from 3a. to functional variants only using a good bioinformatics tool. Figure 3d. is restricted to functional variants only and takes LD buddies into account.

See this image and copyright information in PMC

References

1. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65. - PMC - PubMed
1. Auer PL, Johnsen JM, Johnson AD, Logsdon BA, Lange LA, Nalls MA, Zhang G, Franceschini N, Fox K, Lange EM, Rich SS, O'Donnell CJ, Jackson RD, Wallace RB, Chen Z, Graubert TA, Wilson JG, Tang H, Lettre G, Reiner AP, Ganesh SK, Li Y. Imputation of Exome Sequence Variants into Population- Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project. Am J Hum Genet. 2012;91(5):794–808. - PMC - PubMed
1. Bacanu SA, Nelson MR, Whittaker JC. Comparison of methods and sampling designs to test for association between rare variants and quantitative traits. Genetic Epidemiology 2011 - PubMed
1. Cheung YH, Wang G, Leal SM, Wang S. A fast and noise-resilient approach to detect rare-variant associations with deep sequencing data for complex disorders. Genet Epidemiol. 2012;36(7):675–85. - PMC - PubMed
1. Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305(5685):869–872. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The value of statistical or bioinformatics annotation for rare variant association with quantitative trait

Affiliation

The value of statistical or bioinformatics annotation for rare variant association with quantitative trait

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous