iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
- PMID: 33808317
- PMCID: PMC8036415
- DOI: 10.3390/ijms22073589
iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
Abstract
As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a "word" in linguistics, the word segmentation methods are proposed to divide DNA sequences into "words", and the skip-gram model is employed to transform the "words" into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract "words" from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.
Keywords: convolutional neural network; enhancer; sequence generative adversarial net; word embedding.
Conflict of interest statement
The authors declare no conflict of interest.
Figures









Similar articles
-
iEnhancer-GDM: A Deep Learning Framework Based on Generative Adversarial Network and Multi-head Attention Mechanism to Identify Enhancers and Their Strength.Interdiscip Sci. 2025 Sep;17(3):662-672. doi: 10.1007/s12539-025-00703-9. Epub 2025 May 7. Interdiscip Sci. 2025. PMID: 40335860
-
A deep learning framework for enhancer prediction using word embedding and sequence generation.Biophys Chem. 2022 Jul;286:106822. doi: 10.1016/j.bpc.2022.106822. Epub 2022 May 5. Biophys Chem. 2022. PMID: 35605495
-
iEnhancer-DCLA: using the original sequence to identify enhancers and their strength based on a deep learning framework.BMC Bioinformatics. 2022 Nov 14;23(1):480. doi: 10.1186/s12859-022-05033-x. BMC Bioinformatics. 2022. PMID: 36376800 Free PMC article.
-
Sequence-Based Deep Learning Frameworks on Enhancer-Promoter Interactions Prediction.Curr Pharm Des. 2021;27(15):1847-1855. doi: 10.2174/1381612826666201124112710. Curr Pharm Des. 2021. PMID: 33234095 Review.
-
Attention-based generative adversarial network in medical imaging: A narrative review.Comput Biol Med. 2022 Oct;149:105948. doi: 10.1016/j.compbiomed.2022.105948. Epub 2022 Aug 16. Comput Biol Med. 2022. PMID: 35994931 Review.
Cited by
-
Deep learning approaches for non-coding genetic variant effect prediction: current progress and future prospects.Brief Bioinform. 2024 Jul 25;25(5):bbae446. doi: 10.1093/bib/bbae446. Brief Bioinform. 2024. PMID: 39276327 Free PMC article. Review.
-
DeepDualEnhancer: A Dual-Feature Input DNABert Based Deep Learning Method for Enhancer Recognition.Int J Mol Sci. 2024 Nov 1;25(21):11744. doi: 10.3390/ijms252111744. Int J Mol Sci. 2024. PMID: 39519295 Free PMC article.
-
DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models.Front Med (Lausanne). 2025 Apr 8;12:1503229. doi: 10.3389/fmed.2025.1503229. eCollection 2025. Front Med (Lausanne). 2025. PMID: 40265190 Free PMC article. Review.
-
A Novel Capsule Network with Attention Routing to Identify Prokaryote Phosphorylation Sites.Biomolecules. 2022 Dec 12;12(12):1854. doi: 10.3390/biom12121854. Biomolecules. 2022. PMID: 36551282 Free PMC article.
-
Enhancer-LSTMAtt: A Bi-LSTM and Attention-Based Deep Learning Method for Enhancer Recognition.Biomolecules. 2022 Jul 17;12(7):995. doi: 10.3390/biom12070995. Biomolecules. 2022. PMID: 35883552 Free PMC article.
References
-
- Beytebiere J.R., Trott A.J., Greenwell B.J., Osborne C.A., Vitet H., Spence J., Yoo S.-H., Chen Z., Takahashi J.S., Ghaffari N., et al. Tissue-specific BMAL1 cistromes reveal that rhythmic transcription is associated with rhythmic enhancer-enhancer interactions. Genes Dev. 2019;33:294–309. doi: 10.1101/gad.322198.118. - DOI - PMC - PubMed
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials