Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul 1;28(13):1738-44.
doi: 10.1093/bioinformatics/bts261. Epub 2012 May 4.

A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging

Affiliations

A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging

Benjamin A Logsdon et al. Bioinformatics. .

Abstract

Motivation: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm.

Results: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort.

Availability: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The family-wise error rate (FWER) and power for six different strategies for choosing the model size associated with the zvb statistic, and for single-marker analysis (χ2sma) for simulations of 104 independent genotypes with differing sample sizes and heritabilities. For the FWER, the red horizontal line shows a FWER of 0.05 and the blue horizontal lines show the 95% CIs for controlling FWER to 0.05. The six different strategies are: choice based on minimum of KL diagnostic statistic (zvba), expectation of the diagnostic statistic (zvbb), minimum plus one standard error (zvbc), expectation plus one standard error (zvbd), minimum plus two standard errors (zvbe) and expectation plus two standard errors (zvbf)
Fig. 2.
Fig. 2.
The estimated expected regression coefficients as a function of the penalty parameter ℓ0 for the analysis of height. As the penalty parameter increases in magnitude, the size of the model increases until it becomes over-fit. The position in the path for the four different strategies is shown with the vertical bars, with the chosen strategy (zvbb) in blue. The features that were significant for zvbb at formula image are also shown in blue along the entire path

References

    1. Allen H., et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838. - PMC - PubMed
    1. Beal M. PhD Thesis. University of London; 2003. Variational algorithms for approximate Bayesian inference.
    1. Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B. Met. 1995;57:289–300.
    1. Bishop C. Pattern Recognition and Machine Learning. New York: Springer; 2006.
    1. Carbonetto P., Stephens M. Scalable variational inference for bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis. 2011;6:1–42. - PubMed

Publication types