Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comment
. 2012;8(4):e1002625.
doi: 10.1371/journal.pgen.1002625. Epub 2012 Apr 5.

Improved statistics for genome-wide interaction analysis

Affiliations
Comment

Improved statistics for genome-wide interaction analysis

Masao Ueki et al. PLoS Genet. 2012.

Abstract

Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new "joint effects" statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Chi-squared (1 df) Q-Q plot for Scenario 1 (Global Null).
Top panels ((a), (b) and (c)): Case/Control not in LD; Middle panels ((d), (e) and (f)): Case/Control in LD; Bottom panels ((g), (h) and (i)): Case-Only not in LD; FE: Fast-Epistasis; AFE: Adjusted FE; Wu: Wu et al. statistic; AWu: Adjusted Wu statistic; IWu: Ideal Wu statistic; WZ: Wellek and Ziegler statistic; JE: Joint Effects statistic.
Figure 2
Figure 2. Chi-squared (1 df) Q-Q plot for Scenario 2 (Recessive effect at locus G).
Top panels ((a), (b) and (c)): Case/Control not in LD; Middle panels ((d), (e) and (f)): Case/Control in LD; Bottom panels ((g), (h) and (i)): Case-Only not in LD; FE: Fast-Epistasis; AFE: Adjusted FE; Wu: Wu et al. statistic; AWu: Adjusted Wu statistic; IWu: Ideal Wu statistic; WZ: Wellek and Ziegler statistic; JE: Joint Effects statistic.
Figure 3
Figure 3. Chi-squared (1 df) Q-Q plot for Scenario 5c (Rare disease, Additive effects at both loci).
Top panels ((a), (b) and (c)): Case/Control not in LD; Middle panels ((d), (e) and (f)): Case/Control in LD; Bottom panels ((g), (h) and (i)): Case-Only not in LD; FE: Fast-Epistasis; AFE: Adjusted FE; Wu: Wu et al. statistic; AWu: Adjusted Wu statistic; IWu: Ideal Wu statistic; WZ: Wellek and Ziegler statistic; JE: Joint Effects statistic.
Figure 4
Figure 4. Chi-squared (1 df) Q-Q plot for Scenario 5d (Rare disease, Recessive effects at both loci).
Top panels ((a), (b) and (c)): Case/Control not in LD; Middle panels ((d), (e) and (f)): Case/Control in LD; Bottom panels ((g), (h) and (i)): Case-Only not in LD; FE: Fast-Epistasis; AFE: Adjusted FE; Wu: Wu et al. statistic; AWu: Adjusted Wu statistic; IWu: Ideal Wu statistic; WZ: Wellek and Ziegler statistic; JE: Joint Effects statistic.
Figure 5
Figure 5. Power curves for Scenario 6 (Recessive Recessive).
Power to achieve significance level formula image. Top panels ((a) and (b)): Case/Control not in LD; Middle panels ((c) and (d)): Case/Control in LD; Bottom panels ((e) and (f)): Case-Only not in LD; Left hand panels ((a), (c) and (e)): No main effect; Right hand panels ((b), (d) and (f)): Locus G has main effect; FE: Fast-Epistasis; AFE: Adjusted FE; Wu: Wu et al. statistic; AWu: Adjusted Wu statistic; WZ: Wellek and Ziegler statistic; JE: Joint Effects statistic; IWu: Ideal Wu statistic; C: Logistic regression using correct coding; IC: Logistic regression using incorrect ( = Recessiveformula imageDominant) coding; WZC: Wellek and Ziegler case-only statistic using correct coding; WZIC: Wellek and Ziegler case-only statistic using incorrect ( = Recessiveformula imageDominant) coding.
Figure 6
Figure 6. Power curves for Scenario 7 (Dominant Dominant).
Power to achieve significance level formula image. Top panels ((a) and (b)): Case/Control not in LD; Middle panels ((c) and (d)): Case/Control in LD; Bottom panels ((e) and (f)): Case-Only not in LD; Left hand panels ((a), (c) and (e)): No main effect; Right hand panels ((b), (d) and (f)): Locus G has main effect; FE: Fast-Epistasis; AFE: Adjusted FE; Wu: Wu et al. statistic; AWu: Adjusted Wu statistic; WZ: Wellek and Ziegler statistic; JE: Joint Effects statistic; IWu: Ideal Wu statistic; C: Logistic regression using correct coding; IC: Logistic regression using incorrect ( = Dominantformula imageRecessive) coding; WZC: Wellek and Ziegler case-only statistic using correct coding; WZIC: Wellek and Ziegler case-only statistic using incorrect ( = Dominantformula imageRecessive) coding.

Comment on

Similar articles

Cited by

References

    1. Wu X, Dong H, Luo L, Zhu Y, Peng G, et al. A novel statistic for genome-wide interaction analysis. PLoS Genet. 2010;6:e1001131. doi: 10.1371/journal.pgen.1001131. - DOI - PMC - PubMed
    1. WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. - PMC - PubMed
    1. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316:889–894. - PMC - PubMed
    1. Todd J, Walker N, Cooper J, Smyth D, Downes K, et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat Genet. 2007;39:857–864. - PMC - PubMed
    1. Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, et al. Meta-analysis of genomewide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40:638–645. - PMC - PubMed

Publication types

MeSH terms