Comparative Study

. 2011 Jul 5:12:344.

doi: 10.1186/1471-2164-12-344.

Comparative analysis of methods for detecting interacting loci

Li Chen¹, Guoqiang Yu, Carl D Langefeld, David J Miller, Richard T Guy, Jayaram Raghuram, Xiguo Yuan, David M Herrington, Yue Wang

Affiliations

PMID: 21729295
PMCID: PMC3161015
DOI: 10.1186/1471-2164-12-344

Comparative Study

Comparative analysis of methods for detecting interacting loci

Li Chen et al. BMC Genomics. 2011.

. 2011 Jul 5:12:344.

doi: 10.1186/1471-2164-12-344.

Authors

Li Chen¹, Guoqiang Yu, Carl D Langefeld, David J Miller, Richard T Guy, Jayaram Raghuram, Xiguo Yuan, David M Herrington, Yue Wang

Affiliation

¹ Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA.

PMID: 21729295
PMCID: PMC3161015
DOI: 10.1186/1471-2164-12-344

Abstract

Background: Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted.

Results: We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs.

Conclusion: This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.

PubMed Disclaimer

Figures

**Figure 1**
**A flowchart for the performance evaluation of interaction detection methods**.

**Figure 2**
**A visual illustration of SNP "blocking" and random sampling, used for generating simulated individuals**. "Ind i" denotes the ith real individual, and "Sim Ind" denotes the simulated individual. First, genomes of the real individuals are segmented into a number of blocks; second, for each block, a genome segment is randomly drawn from the set of real individuals; finally, the randomly drawn genome segments, for all blocks, are stitched together to form a simulated individual.

**Figure 3**
**A flowchart detailing all of the steps used in producing the simulated GWAS data sets**.

**Figure 4**
Power evaluation (*definition 1*) of the eight methods on 100 replication data sets with parameter setting: θ = 1.4, β = 1, l = *null*. (a) evaluates the power on the whole ground-truth SNP set, and (b) (c) (d) (e) (f) evaluate the power individually on the 5 interaction models. Blue curve - SH, magenta curve - FIM, green curve - MDR, black curve - IG, cyan curve - MECPM, grey curve - LRIT, yellow curve - LR.

**Figure 5**
Power evaluation (*definition 1*) of six methods on 10 replication data sets with parameter setting: θ = 1.4, β = 1, l = *null*. (a) evaluates the power on the whole ground-truth SNP set, and (b) (c) (d) (e) (f) evaluate the power individually on the 5 interaction models. In (c), all the methods have overlapped power curve at the upmost part of the figure. Magenta curve - FIM, black curve - IG, red curve - BEAM, blue curve - SH, cyan curve - MECPM, grey curve - LRIT, yellow curve - LR.

**Figure 6**
**The impact of penetrance value (θ), MAF (β), and LD factor (l) on power for the whole ground-truth SNP set**. Blue curve - SH, magenta curve - FIM, green curve - MDR, black curve - IG, cyan curve - MECPM, yellow curve LR..

**Figure 7**
**Power evaluation (*definition 2*) of the methods on 100 replication data sets with parameter setting: θ = 1.4, β = 1, l = null**. In (a), FIM, IG, MDR and LRIT have power constantly equal to 0; in (b) FIM and IG and LRIT have power constantly equal to 1; in (d) SH, FIM and MDR have power constantly equal to 0. Blue curve - SH, magenta curve - FIM, green curve - MDR, black curve - IG, grey curve - LRIT, yellow curve - LR.

**Figure 8**
**Power evaluation (*definition 3*) of the eight methods on 100 replication data sets with parameter setting: θ = 1.4, β = 1, l = null**. Blue curve - SH, magenta curve - FIM, green curve - MDR, black curve - IG, grey curve - LRIT, yellow curve - LR.

**Figure 9**
**The power to detect individual SNPs, for parameter θ = 1.4, β = 1, l = null**. Blue curve - SH, magenta curve - FIM, green curve - MDR, black curve - IG, cyan curve -MECPM, grey curve - LRIT, yellow curve - LR.

**Figure 10**
**Power evaluation of 6 methods (using power *definition 1*) on main-effects-only data (step 3)**. Blue curve - SH, magenta curve - FIM, green curve - MDR, cyan curve - MECPM, yellow curve - LR.

**Figure 11**
**Execution time (sec) of 4 methods for: (a) number of SNPs = 1,000; (b) number of subjects = 2,000**. Due to limited space in (b), we list hereby the execution time of the methods on 2000-subject 10,000-SNP data: SH - 962 seconds, IG - 18291 seconds, BEAM - 36423 seconds, FIM - 91251 seconds.

See this image and copyright information in PMC

References

1. Brookes A. Review: the essence of SNPs. Gene. 1999;234:177–186. doi: 10.1016/S0378-1119(99)00219-X. - DOI - PubMed
1. Couzin J, Kaiser J. Genome-wide association. Closing the net on common disease genes. Science. 2007;316:820–2. doi: 10.1126/science.316.5826.820. - DOI - PubMed
1. Hirschhorn J. Genome-wide association studies for common diseases and complex traits. Nature reviews Genetics. 2005;6:95–108. - PubMed
1. Donnelly P. Progress and challenges in genome-wide association studies in humans. Nature. 2008;456:728–31. doi: 10.1038/nature07631. - DOI - PubMed
1. Manolio TA. et al.Finding the missing heritability of complex diseases. Nature. 2009;461:747–53. doi: 10.1038/nature08494. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparative analysis of methods for detecting interacting loci

Affiliation

Comparative analysis of methods for detecting interacting loci

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources