Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun 15;29(12):1526-33.
doi: 10.1093/bioinformatics/btt177. Epub 2013 Apr 18.

A powerful and efficient set test for genetic markers that handles confounders

Affiliations

A powerful and efficient set test for genetic markers that handles confounders

Jennifer Listgarten et al. Bioinformatics. .

Abstract

Motivation: Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger datasets are used to increase power.

Results: We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects-one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured Genetic Analysis Workshop 14 data demonstrates that our method successfully corrects for population structure and family relatedness, whereas application of our method to a 15 000 individual Crohn's disease case-control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis.

Availability: A Python-based library implementing our approach is available at http://mscompbio.codeplex.com.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Quantile–quantile plot of observed and expected log10 P-values on the null-only WTCCC datasets (same data as used for Table 1) for FaST-LMM-Set. Dashed red error bars denote the 99% confidence interval around the solid red diagonal. Points shown are for null-only data (generated by permuting individuals in the SNPs to be tested—see Section 2) and only for the non-unity P-values (those assumed to belong to the non-zero degree of freedom component of the null distribution). The portion of the expected distribution of P-values shown is uniform on the interval [formula image,1], where formula image is the estimated mixing weight in the null distribution

Similar articles

Cited by

References

    1. Astle W, Balding DJ. Population structure and cryptic relatedness in genetic association studies. Stat. Sci. 2009;24:451–471.
    1. Atwell S, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010;465:627–631. - PMC - PubMed
    1. Balding DJ. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 2006;7:781–791. - PubMed
    1. Bansal V, et al. Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet. 2010;11:773–785. - PMC - PubMed
    1. Braun R, Buetow K. Pathways of distinction analysis: a new technique for Multi–SNP analysis of GWAS data. PLoS Genet. 2011;7:e1002101. - PMC - PubMed

Publication types

Substances