. 2013 Jul 18;8(7):e69321.

doi: 10.1371/journal.pone.0069321. Print 2013.

A modified entropy-based approach for identifying gene-gene interactions in case-control study

Jaeyong Yee¹, Min-Seok Kwon, Taesung Park, Mira Park

Affiliations

PMID: 23874943
PMCID: PMC3715501
DOI: 10.1371/journal.pone.0069321

A modified entropy-based approach for identifying gene-gene interactions in case-control study

Jaeyong Yee et al. PLoS One. 2013.

. 2013 Jul 18;8(7):e69321.

doi: 10.1371/journal.pone.0069321. Print 2013.

Authors

Jaeyong Yee¹, Min-Seok Kwon, Taesung Park, Mira Park

Affiliation

¹ Department of Physiology and Biophysics, Eulji University, Daejeon, Korea.

PMID: 23874943
PMCID: PMC3715501
DOI: 10.1371/journal.pone.0069321

Abstract

Gene-gene interactions may play an important role in the genetics of a complex disease. Detection and characterization of gene-gene interactions is a challenging issue that has stimulated the development of various statistical methods to address it. In this study, we introduce a method to measure gene interactions using entropy-based statistics from a contingency table of trait and genotype combinations. We also developed an exploration procedure by using graphs. We propose a standardized relative information gain (RIG) measure to evaluate the interactions between single nucleotide polymorphism (SNP) combinations. To identify the k (th) order interactions, contingency tables of trait and genotype combinations of k SNPs are constructed, with which RIGs are calculated. The RIGs are standardized using the mean and standard deviation from the permuted datasets. SNP combinations yielding high standardized RIG are chosen for gene-gene interactions. Detection of high-order interactions and comparison of interaction strengths between different orders are made possible by using standardized RIG. We have applied the proposed standardized entropy-based method to two types of data sets from a simulation study and a real genetic association study. We have compared our method and the multifactor dimensionality reduction (MDR) method through power analysis of eight different genetic models with varying penetrance rates, number of SNPs, and sample sizes. Our method shows successful identification of genetic associations and gene-gene interactions both in simulation and real genetic data. Simulation results suggest that the proposed entropy-based method is better able to detect high-order interactions and is superior to the MDR method in most cases. The proposed method is well suited for detecting interactions without main effects as well as for models including main effects.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Visualization of the properties of the proposed measures using MDR open-source data.**
The arrows on the upper side of the graph represent the largest observed in each order of interactions. The distributions are the null distribution of obtained by collecting the maximum s from each permuted data. Order of interaction is denoted within the parentheses.

formula image — **Figure 1. Visualization of the properties of the proposed measures using MDR open-source data.**
The arrows on the upper side of the graph represent the largest observed in each order of interactions. The distributions are the null distribution of obtained by collecting the maximum s from each permuted data. Order of interaction is denoted within the parentheses.

**Figure 2. Scree plots of for MDR open-source data.**
Main effects (A), 2^nd order interactions (B), 3^rd order interactions (C) and 4^th order interactions (D) are shown. The observed relative information gain, , is plotted against the rank determined by the magnitude of . Only the top 100 ranked s are plotted for each order of interaction. Top ranked SNP names are denoted within the parentheses.

**Figure 3. Scree plots of for MDR open-source data.**
Main effects (A), 2^nd order interactions (B), 3^rd order interactions (C) and 4^th order interactions (D) are shown. The standardized relative information gain, , is plotted against the rank determined by the magnitude of . Open-source sample set is used to show the plausibility of using . Only the top 100 ranked s are plotted for each order of interaction. Top ranked SNP names are denoted in parentheses. The dotted lines show the upper 5% cut-off values of in the empirical null distribution. SNP combinations above the line may be interpreted as significant at 5% significance level.

**Figure 4. MDS plot for MDR open-source data.**
Multi-dimensional scaling plot is produced using of the 2^nd order interactions. The distance between two points approximates the interaction between the corresponding SNPs. The size of the points is proportional to the size of the main effects.

**Figure 5. Scree plot of for atopic dermatitis data set.**
Same plot arrangement as in Figure 3.

**Figure 6. Power comparison between the methods based on entropy and MDR.**
Hit ratio is used as the empirical power for the fifteen groups each for the eight models. Hit ratio is defined as the ratio at which the incorporated causal pair is identified to have the strongest association. Three different measures, , CVC, and BA are compared. Groups 1, 2, and 3 have the same number of SNPs (10), and the numbers of samples increase with the group numbers (400, 1000, 2000), repeating the same for the next 3 groups with an increased number of SNPs (50), and so on. See Table 3 for details. The power of is shown to be higher than the powers of MDR with CVC or BA. The superiority is clearer, especially for the groups 1, 4, 7, 10, 13 in which the number of samples are insufficient when compared to the number of SNPs. As the number of SNPs increases, the difference in power becomes larger.

See this image and copyright information in PMC

References

1. Zhang H, Bonney G (2000) Use of classification trees for association studies. Genet. Epidemiol. 19: 323–332. - PubMed
1. Sheriff A, Ott J (2001) Applications of neural networks for gene finding. Adv. Genet. 42 287–297. - PubMed
1. Kooperberg C, Ruczinski I (2005) Identifying interacting SNPs using Monte Carlo logic regression. Genet. Epidemiol. 28: 157–170. - PubMed
1. Cordell HJ (2009) Detecting gene-gene interaction that underlies human diseases, Nature Reviews Genetics. 10: 392–403. - PMC - PubMed
1. Hosmer DW, Lemeshow D (2000) Applied logistic regression, 2^nd edn. New York: John Wiley and Sons.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A modified entropy-based approach for identifying gene-gene interactions in case-control study

Affiliation

A modified entropy-based approach for identifying gene-gene interactions in case-control study

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources