Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 21:2:54.
doi: 10.12688/wellcomeopenres.11926.1. eCollection 2017.

Further investigations of the W-test for pairwise epistasis testing

Affiliations

Further investigations of the W-test for pairwise epistasis testing

Richard Howey et al. Wellcome Open Res. .

Abstract

Background: In a recent paper, a novel W-test for pairwise epistasis testing was proposed that appeared, in computer simulations, to have higher power than competing alternatives. Application to genome-wide bipolar data detected significant epistasis between SNPs in genes of relevant biological function. Network analysis indicated that the implicated genes formed two separate interaction networks, each containing genes highly related to autism and neurodegenerative disorders. Methods: Here we investigate further the properties and performance of the W-test via theoretical evaluation, computer simulations and application to real data. Results: We demonstrate that, for common variants, the W-test is closely related to several existing tests of association allowing for interaction, including logistic regression on 8 degrees of freedom, although logistic regression can show inflated type I error for low minor allele frequencies, whereas the W-test shows good/conservative type I error control. Although in some situations the W-test can show higher power, logistic regression is not limited to tests on 8 degrees of freedom but can instead be tailored to impose greater structure on the assumed alternative hypothesis, offering a power advantage when the imposed structure matches the true structure. Conclusions: The W-test is a potentially useful method for testing for association - without necessarily implying interaction - between genetic variants disease, particularly when one or more of the genetic variants are rare. For common variants, the advantages of the W-test are less clear, and, indeed, there are situations where existing methods perform better. In our investigations, we further uncover a number of problems with the practical implementation and application of the W-test (to bipolar disorder) previously described, apparently due to inadequate use of standard data quality-control procedures. This observation leads us to urge caution in interpretation of the previously-presented results, most of which we consider are highly likely to be artefacts.

Keywords: GWAS; Interactions; contingency table; epistasis; quality control.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Scatter plots of negative log (base 10) transformed P-values from different interaction tests applied to the W-test demo data.
The tests are the W-test, Pearson’s χ 2 test (full table), Pearson’s χ 2 test (reduced table), logistic regression with 8 df (LR8). The W-test demo data consists of 500 cases and 500 controls and 50 SNPs. The scatter plots show all 1225 SNP pair tests between the 50 SNPs. The squared Pearson product-moment correlation coefficient is shown in the bottom right of each plot. Crosses indicate points that did not evaluate due to empty cells in cases and/or controls.
Figure 2.
Figure 2.. Scatter plots of negative log (base 10) transformed P-values from different interaction tests applied to the WTCCC2 data.
The tests are the W-test, Pearson’s χ 2 test (full table), Pearson’s χ 2 test (reduced table), logistic regression with 8 df (LR8). The WTCCC2 data consists of a subset of 1000 female founders and 50 SNPs which were alternatively labelled as cases and controls. The scatter plots show all 1225 SNP pair tests between the 50 SNPs. The squared Pearson product-moment correlation coefficient is shown in the bottom right of each plot. Crosses indicate points that did not evaluate due to empty cells in cases and/or controls.
Figure 3.
Figure 3.. Power and Type I error plots for different effect models and tests.
The simulating model is indicated above the plot for linear effects and for complex effects the log odds for each genotype combination is shown in the table in the bottom left plot. Tests considered are JE: Joint effects; AWU: Adjusted Wu; AFE: Adjusted Fast Epistasis; WZ: Welleck-Ziegler; LR1: Logistic regression with 1 df testing for interaction accounting for main effects; LR3: Logistic regression with 3 df testing for interaction and main effects; LR8: Logistic regression with 8 df testing for interaction and main effects, one parameter for every genotype combination between the two SNPs; LRI: Logistic regression with 1 df testing for interaction without accounting for main effects; CHI-f: full table χ 2 test with cell counts for every genotype combination between the two SNPs; CHI-r: reduced table χ 2 test where unobserved genotype categories are removed from consideration W: W-test with values of h and f estimated in Wang et al.’s Supplementary Table S2 using real WTCCC data; W : W-test with default values of h and f.
Figure 4.
Figure 4.. Power plots for different effect models and tests for SNPs with low minor allele frequencies (MAF=0.1).
The simulating model is indicated above the plot. Test abbreviations are described in the legend to Figure 3 and CHI-f is the χ 2 test where a undefined test result is counted as a non-detection and included in the denominator. Plots on the left show results for independent SNPs and plots on the right for SNPs in LD ( R 2 = 0.24 in controls, R 2 = 0.29 in combined cases and controls).
Figure 5.
Figure 5.. Power plots for SNPs showing complex effects with very low minor allele frequencies (MAF=0.01) and in strong LD ( R 2 = 0.64 in controls, R 2 = 0.83 in combined cases and controls).
Test abbreviations are described in the legends to Figure 3 and Figure 4.
Figure 6.
Figure 6.. Q-Q plots of interaction tests.
The top left plot shows a Q-Q plot of the W-test P-values using the W-test demo data. The remaining plots show Q-Q plots generated using the Q-Q plot function from the W-test R package each time using the same W-test demo data.

Similar articles

References

    1. Wang MH, Sun R, Guo J, et al. : A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Res. 2016;44(12):e115. 10.1093/nar/gkw347 - DOI - PMC - PubMed
    1. Newman SC: Biostatistical Methods in Epidemiology. Wiley,2001. 10.1002/0471272612 - DOI
    1. Agresti A: Categorical Data Analysis, 3rd Edition. Wiley,2013. Reference Source
    1. Pearson K: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine Series 5. 1900;50(302):157–175. 10.1080/14786440009463897 - DOI
    1. Phillips PC: The language of gene interaction. Genetics. 1998;149(3):1167–1171. - PMC - PubMed

LinkOut - more resources