Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr;19(4):465-71.
doi: 10.1038/ejhg.2010.196. Epub 2010 Dec 8.

EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units

Affiliations

EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units

Tony Kam-Thong et al. Eur J Hum Genet. 2011 Apr.

Abstract

Detection of epistatic interaction between loci has been postulated to provide a more in-depth understanding of the complex biological and biochemical pathways underlying human diseases. Studying the interaction between two loci is the natural progression following traditional and well-established single locus analysis. However, the added costs and time duration required for the computation involved have thus far deterred researchers from pursuing a genome-wide analysis of epistasis. In this paper, we propose a method allowing such analysis to be conducted very rapidly. The method, dubbed EPIBLASTER, is applicable to case-control studies and consists of a two-step process in which the difference in Pearson's correlation coefficients is computed between controls and cases across all possible SNP pairs as an indication of significant interaction warranting further analysis. For the subset of interactions deemed potentially significant, a second-stage analysis is performed using the likelihood ratio test from the logistic regression to obtain the P-value for the estimated coefficients of the individual effects and the interaction term. The algorithm is implemented using the parallel computational capability of commercially available graphical processing units to greatly reduce the computation time involved. In the current setup and example data sets (211 cases, 222 controls, 299468 SNPs; and 601 cases, 825 controls, 291095 SNPs), this coefficient evaluation stage can be completed in roughly 1 day. Our method allows for exhaustive and rapid detection of significant SNP pair interactions without imposing significant marginal effects of the single loci involved in the pair.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histogram of differences of correlation coefficients of all two-way interactions of 2000 SNPs exhibiting the expected Gaussian distribution shape.
Figure 2
Figure 2
Logarithmic P-values from the interaction term of logistic regression versus correlation coefficient differences of all two-way interactions from 2000 SNPs.
Figure 3
Figure 3
Logarithmic P-values from the interaction term of the logistic regression model versus correlation coefficient differences P-values from 2000 SNPs (2000C2=1999000 SNP–SNP pairs). Quality of fit (R2) between the P-values is 99.9%.
Figure 4
Figure 4
Panic disorder logarithmic P-values density plot: top 10 SNP pairs (points marked in black) and threshold correlation coefficient difference P-value. FastEpistasis P-values are on the y-axis, P-values from EPIBLASTER are on the x-axis.
Figure 5
Figure 5
Panic disorder logarithmic P-values density plot: top 100 SNP pairs (points marked in black) and threshold correlation coefficient difference P-value. FastEpistasis P-values are on the y-axis, P-values from EPIBLASTER are on the x-axis.
Figure 6
Figure 6
Multiple sclerosis logarithmic P-values density plot: top 10 SNP pairs (points marked in black) and threshold correlation coefficient difference P-value.
Figure 7
Figure 7
Multiple sclerosis logarithmic P-values density plot: top 100 SNP pairs (points marked in black) and threshold correlation coefficient difference P-value.

References

    1. Marchini J, Donelly P, Cardon LR. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet. 2005;37:413–417. - PubMed
    1. Wellek S, Ziegler A. A genotype-based approach to assessing the association between single nucleotide polymorphisms. Hum Hered. 2009;67:128–139. - PubMed
    1. Gretton A, Borgwardt K, Rasch B, Schölkopf B, Smola A. A kernel method for the two-sample-problem. NIPS. 2006. pp. 513–520.
    1. R Development Core Team R: A language and environment for statistical computing Vienna, Austria: R Foundation for Statistical Computing; 2009. ISBN 3-900051-07-0, , http://www.R-project.org .
    1. Buckner J, Wilson J, Seligman M, Athey B, Watson S, Meng F. The gputools package enables GPU computing in R. Bioinformatics. 2010;26:134–135. - PMC - PubMed

Publication types