Gene-gene interaction: the curse of dimensionality
- PMID: 32042829
- PMCID: PMC6989881
- DOI: 10.21037/atm.2019.12.87
Gene-gene interaction: the curse of dimensionality
Abstract
Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one- dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
Keywords: Gene-gene interaction; PySpark; deep-learning (DL); machine-learning (ML); multifactor dimensionality reduction (MDR); parallel computing.
2019 Annals of Translational Medicine. All rights reserved.
Conflict of interest statement
Conflicts of Interest: The authors have no conflicts of interest to declare.
Similar articles
-
A Comparative Study on Multifactor Dimensionality Reduction Methods for Detecting Gene-Gene Interactions with the Survival Phenotype.Biomed Res Int. 2015;2015:671859. doi: 10.1155/2015/671859. Epub 2015 Aug 3. Biomed Res Int. 2015. PMID: 26339630 Free PMC article.
-
A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions.Bioinformatics. 2016 Sep 1;32(17):i605-i610. doi: 10.1093/bioinformatics/btw424. Bioinformatics. 2016. PMID: 27587680
-
A comparative study on the unified model based multifactor dimensionality reduction methods for identifying gene-gene interactions associated with the survival phenotype.BioData Min. 2021 Mar 1;14(1):17. doi: 10.1186/s13040-021-00248-9. BioData Min. 2021. PMID: 33648540 Free PMC article.
-
A roadmap to multifactor dimensionality reduction methods.Brief Bioinform. 2016 Mar;17(2):293-308. doi: 10.1093/bib/bbv038. Epub 2015 Jun 24. Brief Bioinform. 2016. PMID: 26108231 Free PMC article. Review.
-
A survey about methods dedicated to epistasis detection.Front Genet. 2015 Sep 10;6:285. doi: 10.3389/fgene.2015.00285. eCollection 2015. Front Genet. 2015. PMID: 26442103 Free PMC article. Review.
Cited by
-
Penetrating Exploration of Prognostic Correlations of the FKBP Gene Family with Lung Adenocarcinoma.J Pers Med. 2022 Dec 26;13(1):49. doi: 10.3390/jpm13010049. J Pers Med. 2022. PMID: 36675710 Free PMC article.
-
Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values.Genes (Basel). 2021 Oct 30;12(11):1754. doi: 10.3390/genes12111754. Genes (Basel). 2021. PMID: 34828360 Free PMC article.
-
17 variants interaction of Wnt/β-catenin pathway associated with development of osteonecrosis of femoral head in Chinese Han population.Sci Rep. 2024 Mar 27;14(1):7301. doi: 10.1038/s41598-024-57929-8. Sci Rep. 2024. PMID: 38538713 Free PMC article.
-
Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction.Eur J Med Res. 2024 Aug 2;29(1):404. doi: 10.1186/s40001-024-01983-5. Eur J Med Res. 2024. PMID: 39095899 Free PMC article.
-
Putative protective genomic variation in the Lithuanian population.Genet Mol Biol. 2024 Apr 15;47(2):e20230030. doi: 10.1590/1678-4685-GMB-2023-0030. eCollection 2024. Genet Mol Biol. 2024. PMID: 38626572 Free PMC article.
References
-
- Bateson W, Mendel G. Mendel's principles of heredity. Courier Corporation; 2013.
-
- Fisher RA. XV.—The correlation between relatives on the supposition of Mendelian inheritance. Earth and Environmental Science Transactions of the Royal Society of Edinburgh 1919;52:399-433. 10.1017/S0080456800012163 - DOI
Publication types
LinkOut - more resources
Full Text Sources