Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 18;8(7):e69321.
doi: 10.1371/journal.pone.0069321. Print 2013.

A modified entropy-based approach for identifying gene-gene interactions in case-control study

Affiliations

A modified entropy-based approach for identifying gene-gene interactions in case-control study

Jaeyong Yee et al. PLoS One. .

Abstract

Gene-gene interactions may play an important role in the genetics of a complex disease. Detection and characterization of gene-gene interactions is a challenging issue that has stimulated the development of various statistical methods to address it. In this study, we introduce a method to measure gene interactions using entropy-based statistics from a contingency table of trait and genotype combinations. We also developed an exploration procedure by using graphs. We propose a standardized relative information gain (RIG) measure to evaluate the interactions between single nucleotide polymorphism (SNP) combinations. To identify the k (th) order interactions, contingency tables of trait and genotype combinations of k SNPs are constructed, with which RIGs are calculated. The RIGs are standardized using the mean and standard deviation from the permuted datasets. SNP combinations yielding high standardized RIG are chosen for gene-gene interactions. Detection of high-order interactions and comparison of interaction strengths between different orders are made possible by using standardized RIG. We have applied the proposed standardized entropy-based method to two types of data sets from a simulation study and a real genetic association study. We have compared our method and the multifactor dimensionality reduction (MDR) method through power analysis of eight different genetic models with varying penetrance rates, number of SNPs, and sample sizes. Our method shows successful identification of genetic associations and gene-gene interactions both in simulation and real genetic data. Simulation results suggest that the proposed entropy-based method is better able to detect high-order interactions and is superior to the MDR method in most cases. The proposed method is well suited for detecting interactions without main effects as well as for models including main effects.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Visualization of the properties of the proposed measures using MDR open-source data.
The arrows on the upper side of the graph represent the largest observed formula image in each order of interactions. The distributions are the null distribution of formula image obtained by collecting the maximum formula images from each permuted data. Order of interaction is denoted within the parentheses.
Figure 2
Figure 2. Scree plots of for MDR open-source data.
Main effects (A), 2nd order interactions (B), 3rd order interactions (C) and 4th order interactions (D) are shown. The observed relative information gain, formula image, is plotted against the rank determined by the magnitude of formula image. Only the top 100 ranked formula images are plotted for each order of interaction. Top ranked SNP names are denoted within the parentheses.
Figure 3
Figure 3. Scree plots of for MDR open-source data.
Main effects (A), 2nd order interactions (B), 3rd order interactions (C) and 4th order interactions (D) are shown. The standardized relative information gain, formula image, is plotted against the rank determined by the magnitude of formula image. Open-source sample set is used to show the plausibility of using formula image. Only the top 100 ranked formula images are plotted for each order of interaction. Top ranked SNP names are denoted in parentheses. The dotted lines show the upper 5% cut-off values of formula image in the empirical null distribution. SNP combinations above the line may be interpreted as significant at 5% significance level.
Figure 4
Figure 4. MDS plot for MDR open-source data.
Multi-dimensional scaling plot is produced using formula image of the 2nd order interactions. The distance between two points approximates the interaction between the corresponding SNPs. The size of the points is proportional to the size of the main effects.
Figure 5
Figure 5. Scree plot of for atopic dermatitis data set.
Same plot arrangement as in Figure 3.
Figure 6
Figure 6. Power comparison between the methods based on entropy and MDR.
Hit ratio is used as the empirical power for the fifteen groups each for the eight models. Hit ratio is defined as the ratio at which the incorporated causal pair is identified to have the strongest association. Three different measures, formula image, CVC, and BA are compared. Groups 1, 2, and 3 have the same number of SNPs (10), and the numbers of samples increase with the group numbers (400, 1000, 2000), repeating the same for the next 3 groups with an increased number of SNPs (50), and so on. See Table 3 for details. The power of formula image is shown to be higher than the powers of MDR with CVC or BA. The superiority is clearer, especially for the groups 1, 4, 7, 10, 13 in which the number of samples are insufficient when compared to the number of SNPs. As the number of SNPs increases, the difference in power becomes larger.

Similar articles

Cited by

References

    1. Zhang H, Bonney G (2000) Use of classification trees for association studies. Genet. Epidemiol. 19: 323–332. - PubMed
    1. Sheriff A, Ott J (2001) Applications of neural networks for gene finding. Adv. Genet. 42 287–297. - PubMed
    1. Kooperberg C, Ruczinski I (2005) Identifying interacting SNPs using Monte Carlo logic regression. Genet. Epidemiol. 28: 157–170. - PubMed
    1. Cordell HJ (2009) Detecting gene-gene interaction that underlies human diseases, Nature Reviews Genetics. 10: 392–403. - PMC - PubMed
    1. Hosmer DW, Lemeshow D (2000) Applied logistic regression, 2nd edn. New York: John Wiley and Sons.

Publication types

LinkOut - more resources