A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging
- PMID: 22563072
- PMCID: PMC3381972
- DOI: 10.1093/bioinformatics/bts261
A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging
Abstract
Motivation: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm.
Results: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort.
Availability: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html.
Figures
are also shown in blue along the entire pathReferences
-
- Beal M. PhD Thesis. University of London; 2003. Variational algorithms for approximate Bayesian inference.
-
- Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B. Met. 1995;57:289–300.
-
- Bishop C. Pattern Recognition and Machine Learning. New York: Springer; 2006.
-
- Carbonetto P., Stephens M. Scalable variational inference for bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis. 2011;6:1–42. - PubMed
