Comparative analysis of methods for detecting interacting loci
- PMID: 21729295
- PMCID: PMC3161015
- DOI: 10.1186/1471-2164-12-344
Comparative analysis of methods for detecting interacting loci
Abstract
Background: Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted.
Results: We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs.
Conclusion: This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.
Figures











Similar articles
-
Evaluating the ability of tree-based methods and logistic regression for the detection of SNP-SNP interaction.Ann Hum Genet. 2009 May;73(Pt 3):360-9. doi: 10.1111/j.1469-1809.2009.00511.x. Epub 2009 Mar 8. Ann Hum Genet. 2009. PMID: 19291098
-
Learning genetic epistasis using Bayesian network scoring criteria.BMC Bioinformatics. 2011 Mar 31;12:89. doi: 10.1186/1471-2105-12-89. BMC Bioinformatics. 2011. PMID: 21453508 Free PMC article.
-
An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions.Bioinformatics. 2009 Oct 1;25(19):2478-85. doi: 10.1093/bioinformatics/btp435. Epub 2009 Jul 16. Bioinformatics. 2009. PMID: 19608708 Free PMC article.
-
Genetic interactions effects for cancer disease identification using computational models: a review.Med Biol Eng Comput. 2021 Apr;59(4):733-758. doi: 10.1007/s11517-021-02343-9. Epub 2021 Apr 11. Med Biol Eng Comput. 2021. PMID: 33839998 Review.
-
Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1580-91. doi: 10.1109/TCBB.2011.46. IEEE/ACM Trans Comput Biol Bioinform. 2011. PMID: 21383421 Review.
Cited by
-
Information Theory in Computational Biology: Where We Stand Today.Entropy (Basel). 2020 Jun 6;22(6):627. doi: 10.3390/e22060627. Entropy (Basel). 2020. PMID: 33286399 Free PMC article.
-
Genome-wide identification of significant aberrations in cancer genome.BMC Genomics. 2012 Jul 27;13:342. doi: 10.1186/1471-2164-13-342. BMC Genomics. 2012. PMID: 22839576 Free PMC article.
-
Performance analysis of novel methods for detecting epistasis.BMC Bioinformatics. 2011 Dec 15;12:475. doi: 10.1186/1471-2105-12-475. BMC Bioinformatics. 2011. PMID: 22172045 Free PMC article.
-
Comparative analysis of methods for identifying recurrent copy number alterations in cancer.PLoS One. 2012;7(12):e52516. doi: 10.1371/journal.pone.0052516. Epub 2012 Dec 20. PLoS One. 2012. PMID: 23285074 Free PMC article.
-
Theoretical Evaluation of Multi-Breed Genomic Prediction in Chinese Indigenous Cattle.Animals (Basel). 2019 Oct 11;9(10):789. doi: 10.3390/ani9100789. Animals (Basel). 2019. PMID: 31614691 Free PMC article.
References
-
- Hirschhorn J. Genome-wide association studies for common diseases and complex traits. Nature reviews Genetics. 2005;6:95–108. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources