iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor
- PMID: 33119044
- DOI: 10.1093/bioinformatics/btaa914
iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor
Abstract
Motivation: Enhancers are non-coding DNA fragments with high position variability and free scattering. They play an important role in controlling gene expression. As machine learning has become more widely used in identifying enhancers, a number of bioinformatic tools have been developed. Although several models for identifying enhancers and their strengths have been proposed, their accuracy and efficiency have yet to be improved.
Results: We propose a two-layer predictor called 'iEnhancer-XG.' It comprises a one-layer predictor (for identifying enhancers) and a second classifier (for their strength) and uses 'XGBoost' as a base classifier and five feature extraction methods, namely, k-Spectrum Profile, Mismatch k-tuple, Subsequence Profile, Position-specific scoring matrix (PSSM) and Pseudo dinucleotide composition (PseDNC). Each method has an independent output. We place the feature vector matrix into the ensemble learning for fusion. This experiment involves the method of 'SHapley Additive explanations' to provide interpretability for the previous black box machine learning methods and improve their credibility. The accuracies of the ensemble learning method are 0.811 (first layer) and 0.657 (second layer). The rigorous 10-fold cross-validation confirms that the proposed method is significantly better than existing technologies.
Availability and implementation: The source code and dataset for the enhancer predictions have been uploaded to https://github.com/jimmyrate/ienhancer-xg.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Similar articles
-
iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information.Brief Funct Genomics. 2023 May 18;22(3):302-311. doi: 10.1093/bfgp/elac057. Brief Funct Genomics. 2023. PMID: 36715222
-
iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach.Bioinformatics. 2018 Nov 15;34(22):3835-3842. doi: 10.1093/bioinformatics/bty458. Bioinformatics. 2018. PMID: 29878118
-
iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition.Bioinformatics. 2016 Feb 1;32(3):362-9. doi: 10.1093/bioinformatics/btv604. Epub 2015 Oct 17. Bioinformatics. 2016. PMID: 26476782
-
iEnhancer-KL: A Novel Two-Layer Predictor for Identifying Enhancers by Position Specific of Nucleotide Composition.IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2809-2815. doi: 10.1109/TCBB.2021.3053608. Epub 2021 Dec 8. IEEE/ACM Trans Comput Biol Bioinform. 2021. PMID: 33481715
-
iEnhancer-RD: Identification of enhancers and their strength using RKPK features and deep neural networks.Anal Biochem. 2021 Oct 1;630:114318. doi: 10.1016/j.ab.2021.114318. Epub 2021 Aug 5. Anal Biochem. 2021. PMID: 34364858
Cited by
-
DPB-NBFnet: Using neural Bellman-Ford networks to predict DNA-protein binding.Front Pharmacol. 2022 Oct 28;13:1018294. doi: 10.3389/fphar.2022.1018294. eCollection 2022. Front Pharmacol. 2022. PMID: 36386160 Free PMC article.
-
An Efficient Lightweight Hybrid Model with Attention Mechanism for Enhancer Sequence Recognition.Biomolecules. 2022 Dec 29;13(1):70. doi: 10.3390/biom13010070. Biomolecules. 2022. PMID: 36671456 Free PMC article.
-
Prediction of lncRNA-Protein Interactions via the Multiple Information Integration.Front Bioeng Biotechnol. 2021 Feb 25;9:647113. doi: 10.3389/fbioe.2021.647113. eCollection 2021. Front Bioeng Biotechnol. 2021. PMID: 33718346 Free PMC article.
-
PorcineAI-Enhancer: Prediction of Pig Enhancer Sequences Using Convolutional Neural Networks.Animals (Basel). 2023 Sep 15;13(18):2935. doi: 10.3390/ani13182935. Animals (Basel). 2023. PMID: 37760334 Free PMC article.
-
Prediction of Plant Resistance Proteins Based on Pairwise Energy Content and Stacking Framework.Front Plant Sci. 2022 May 31;13:912599. doi: 10.3389/fpls.2022.912599. eCollection 2022. Front Plant Sci. 2022. PMID: 35712582 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials