Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012;7(8):e42530.
doi: 10.1371/journal.pone.0042530. Epub 2012 Aug 9.

Comparison of statistical tests for association between rare variants and binary traits

Affiliations
Comparative Study

Comparison of statistical tests for association between rare variants and binary traits

Silviu-Alin Bacanu et al. PLoS One. 2012.

Abstract

Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait's variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: SAB was employed by GlaxoSmithKline (GSK) and JCW and MRN are employed by GSK. However, none of the methods presented in this manuscript are patented, patentable or help GSK financially. Even more, the authors’ employment does not alter the authors’ adherence to all the PLoS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Relative size of the test* for 1000 cases and 1000 controls at a type I error of 10−3.
The size of the test estimated empirically from 25,000 simulations. Black, turquoise and red circles correspond to gene CDS equal to 10, 50, and 90 percentiles of the human gene CDS distribution, respectively. Methods: RCS – (logistic) regression on carriage status, RCS-C – (logistic) regression on carriage status and covariates, CA – C-alpha test, CA-P – C-alpha test with permutations, TH – test of trend and heterogeneity, SK – SKAT, SK-R – SKAT with (parametric bootstrap) resampling, ER – EREC. *The ratio of the size of the test to the nominal type I error rate.
Figure 2
Figure 2. Empirical power at a type I error of 10−3 for Scenario 1 under homogeneity (ξ = 1).
The power estimated from 250 simulations. The covariate is assumed to be explaining a fraction (Rsq) equal to 0, 10 or 20% of the variability in binary trait. Power is presented for 10% (black), 50% (turquoise) and 90% (red) percentiles of CDS length. See Fig. 1 for background and abbreviation.
Figure 3
Figure 3. Empirical power at a type I error of 10−3 for Scenario 1 under heterogeneity (ξ = 0.5).
See Fig. 1 and 2 for background and abbreviation.
Figure 4
Figure 4. Empirical power at a type I error of 10 for Scenario 1 under partial heterogeneity (ξ = 0.8).
See Fig. 1 and 2 for background and abbreviation.

References

    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106: 9362–9367. - PMC - PubMed
    1. Lango AH, Estrada K, Lettre G, Berndt SI, Weedon MN, et al. (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467: 832–838. nature09410 [pii]. doi: 10.1038/nature09410. - PMC - PubMed
    1. Sladek R, Rocheleau G, Rung J, Dina C, Shen L, et al. (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445: 881–885. - PubMed
    1. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678. - PMC - PubMed
    1. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, et al. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460: 748–752. nature08185 [pii]. doi: 10.1038/nature08185. - PMC - PubMed

Publication types