Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(10):e26435.
doi: 10.1371/journal.pone.0026435. Epub 2011 Oct 25.

A novel evolution-based method for detecting gene-gene interactions

Affiliations

A novel evolution-based method for detecting gene-gene interactions

Shaoqi Rao et al. PLoS One. 2011.

Abstract

Background: The rapid advance in large-scale SNP-chip technologies offers us great opportunities in elucidating the genetic basis of complex diseases. Methods for large-scale interactions analysis have been under development from several sources. Due to several difficult issues (e.g., sparseness of data in high dimensions and low replication or validation rate), development of fast, powerful and robust methods for detecting various forms of gene-gene interactions continues to be a challenging task.

Methodology/principal findings: In this article, we have developed an evolution-based method to search for genome-wide epistasis in a case-control design. From an evolutionary perspective, we view that human diseases originate from ancient mutations and consider that the underlying genetic variants play a role in differentiating human population into the healthy and the diseased. Based on this concept, traditional evolutionary measure, fixation index (Fst) for two unlinked loci, which measures the genetic distance between populations, should be able to reveal the responsible genetic interplays for disease traits. To validate our proposal, we first investigated the theoretical distribution of Fst by using extensive simulations. Then, we explored its power for detecting gene-gene interactions via SNP markers, and compared it with the conventional Pearson Chi-square test, mutual information based test and linkage disequilibrium based test under several disease models. The proposed evolution-based method outperformed these compared methods in dominant and additive models, no matter what the disease allele frequencies were. However, its performance was relatively poor in a recessive model. Finally, we applied the proposed evolution-based method to analysis of a published dataset. Our results showed that the P value of the Fst -based statistic is smaller than those obtained by the LD-based statistic or Poisson regression models.

Conclusions/significance: With rapidly growing large-scale genetic association studies, the proposed evolution-based method can be a promising tool in the identification of epistatic effects.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Frequency histogram of the Fst based statistic based on 10,000 simulations, compared with F(2, 3994).
The gray bar denotes the frequency histogram of the Fst based statistic, corresponding to the null hypothesis that two SNPs are of no interaction. The red line is the density curve of the theoretical one.
Figure 2
Figure 2. Frequency histogram of 2000×MI based on 10,000 simulations, compared with χ2(8).
The gray bar denotes the frequency histogram of 2000×MI, corresponding to the null hypothesis that two SNPs are of no interaction. The red line is the density curve of χ2(8).
Figure 3
Figure 3. Frequency histogram of the LD based statistic based on 10,000 simulations, compared with χ2(1).
The gray bar denotes the frequency histogram of the LD based statistic, corresponding to the null hypothesis that two SNPs are of no interaction. The red line is the density curve of the theoretical one.
Figure 4
Figure 4. Power of four statistics under three different models when the disease allele frequencies at the two loci are high.
The disease prevalence is assumed to be 1%. The disease allele frequencies at the two loci (g1 and g2) are 0.3 and 0.8, respectively. The power, at significance level α of 0.05, is obtained based on simulations of 500 cases and 500 controls. The green, red, blue, and cyan lines are the power of Fst based statistic, Pearson's Chi-square statistic, MI based statistic, and LD based statistic, respectively. Three plots (A, B, C) correspond to the dominant model, the additive model and the recessive model, respectively.
Figure 5
Figure 5. Power of four statistics under three different model when the disease allele frequencies at the two loci are low.
The disease prevalence is assumed to be 1%. The disease allele frequencies at the two loci are 0.2 and 0.4, respectively. The power, at significance level α of 0.05, is obtained based on simulations of 500 cases and 500 controls. The green, red, blue, and cyan lines are the power of Fst based statistic, Pearson's Chi-square statistic, MI based statistic, and LD based statistic, respectively. Three plots (A, B, C) correspond to the dominant model, the additive model and the recessive model, respectively.
Figure 6
Figure 6. Power of Fst under different parameter settings.
The disease prevalence is assumed to be 1%. The power, at significance level α of 0.05, is obtained based on simulations of 500 cases and 500 controls. Three solid colorful lines (red, green and blue) correspond to the power curves of the Fst-based statistic under three genetic models (dominant, additive and recessive), when the disease allele frequencies at the two loci (g1 and g2) are 0.3 and 0.8, respectively. The dotted lines are the power curves under the assumption that the disease allele frequencies at the two loci are 0.2 and 0.4, respectively.

References

    1. Gayan J, Gonzalez-Perez A, Bermudo F, Saez ME, Royo JL, et al. A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis. BMC Genomics. 2008;9:360. - PMC - PubMed
    1. Marchini J, Donnelly P, Cardon LR. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet. 2005;37:413–417. - PubMed
    1. Hoh J, Ott J. Mathematical multi-locus approaches to localizing complex human trait genes. Nat Rev Genet. 2003;4:701–709. - PubMed
    1. He H, Oetting WS, Brott MJ, Basu S. Power of multifactor dimensionality reduction and penalized logistic regression for detecting gene-gene interaction in a case-control study. BMC Med Genet. 2009;10:127. - PMC - PubMed
    1. Camp NJ, Slattery ML. Classification tree analysis: a statistical tool to investigate risk factor interactions with an example for colon cancer (United States). Cancer Causes Control. 2002;13:813–823. - PubMed

Publication types