Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 24;21(1):68.
doi: 10.1186/s12859-020-3368-2.

GenEpi: gene-based epistasis discovery using machine learning

Collaborators, Affiliations

GenEpi: gene-based epistasis discovery using machine learning

Yu-Chuan Chang et al. BMC Bioinformatics. .

Abstract

Background: Genome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).

Results: In this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.

Conclusions: The results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future.

Keywords: Epistasis; GWAS; Machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The architecture of GenEpi
Fig. 2
Fig. 2
The boxplot for the rank of the target epistasis in different algorithms. a The results of three basic-model datasets with one epistasis consisting of a SNP pair. b The result of the complex-model dataset, which contained three epistasis. The ‘S1-S2’ means the epistasis between SNP 1 and SNP 2 and so on. The values on the boxplot are the medians of the rank of the target epistasis among the 100 runs of simulation
Fig. 3
Fig. 3
The boxplot of false positives in L1-regularized regression with and without stability selection
Fig. 4
Fig. 4
The ROC curves of different algorithms
Fig. 5
Fig. 5
The heatmap of gene expression in different tissues for the 12 genes selected by GenEpi. The blue box highlights the sub-regions of brain

References

    1. Kingsmore SF, Lindquist IE, Mudge J, Gessler DD, Beavis WD. Genome-wide association studies: progress and potential for drug discovery and development. Nat Rev Drug Discov. 2008;7:221–230. doi: 10.1038/nrd2519. - DOI - PMC - PubMed
    1. Ozaki K, et al. Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat Genet. 2002;32:650–654. doi: 10.1038/ng1047. - DOI - PubMed
    1. Klein RJ, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. - DOI - PMC - PubMed
    1. Pinero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017;45:D833–D839. doi: 10.1093/nar/gkw943. - DOI - PMC - PubMed
    1. McCarthy MI, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9:356–369. doi: 10.1038/nrg2344. - DOI - PubMed