EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation
- PMID: 28851273
- PMCID: PMC5576297
- DOI: 10.1186/s12859-017-1792-8
EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation
Abstract
Background: Prediction of DNA-binding residue is important for understanding the protein-DNA recognition mechanism. Many computational methods have been proposed for the prediction, but most of them do not consider the relationships of evolutionary information between residues.
Results: In this paper, we first propose a novel residue encoding method, referred to as the Position Specific Score Matrix (PSSM) Relation Transformation (PSSM-RT), to encode residues by utilizing the relationships of evolutionary information between residues. PDNA-62 and PDNA-224 are used to evaluate PSSM-RT and two existing PSSM encoding methods by five-fold cross-validation. Performance evaluations indicate that PSSM-RT is more effective than previous methods. This validates the point that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction. An ensemble learning classifier (EL_PSSM-RT) is also proposed by combining ensemble learning model and PSSM-RT to better handle the imbalance between binding and non-binding residues in datasets. EL_PSSM-RT is evaluated by five-fold cross-validation using PDNA-62 and PDNA-224 as well as two independent datasets TS-72 and TS-61. Performance comparisons with existing predictors on the four datasets demonstrate that EL_PSSM-RT is the best-performing method among all the predicting methods with improvement between 0.02-0.07 for MCC, 4.18-21.47% for ST and 0.013-0.131 for AUC. Furthermore, we analyze the importance of the pair-relationships extracted by PSSM-RT and the results validates the usefulness of PSSM-RT for encoding DNA-binding residues.
Conclusions: We propose a novel prediction method for the prediction of DNA-binding residue with the inclusion of relationship of evolutionary information and ensemble learning. Performance evaluation shows that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction and ensemble learning can be used to address the data imbalance issue between binding and non-binding residues. A web service of EL_PSSM-RT ( http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/ ) is provided for free access to the biological research community.
Keywords: DNA-binding residue; DNA-protein interaction; Ensemble learning; PSSM; Random forest; Relation transformation; SVM.
Conflict of interest statement
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures








Similar articles
-
EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning.IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):124-135. doi: 10.1109/TCBB.2018.2858806. Epub 2018 Jul 23. IEEE/ACM Trans Comput Biol Bioinform. 2020. PMID: 30040656
-
Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6. BMC Syst Biol. 2015. PMID: 25708928 Free PMC article.
-
Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89. BMC Bioinformatics. 2012. PMID: 22574904 Free PMC article.
-
Predicting the cytotoxicity of chemicals using ensemble learning methods and molecular fingerprints.J Appl Toxicol. 2019 Oct;39(10):1366-1377. doi: 10.1002/jat.3785. Epub 2019 Feb 14. J Appl Toxicol. 2019. PMID: 30763981 Review.
-
PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy.Anal Biochem. 2022 Dec 1;658:114935. doi: 10.1016/j.ab.2022.114935. Epub 2022 Oct 4. Anal Biochem. 2022. PMID: 36206844 Review.
Cited by
-
PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine.BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):522. doi: 10.1186/s12859-018-2527-1. BMC Bioinformatics. 2018. PMID: 30598073 Free PMC article.
-
Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network.Int J Mol Sci. 2019 Jul 12;20(14):3425. doi: 10.3390/ijms20143425. Int J Mol Sci. 2019. PMID: 31336830 Free PMC article.
-
Prediction of Protein-ATP Binding Residues Based on Ensemble of Deep Convolutional Neural Networks and LightGBM Algorithm.Int J Mol Sci. 2021 Jan 19;22(2):939. doi: 10.3390/ijms22020939. Int J Mol Sci. 2021. PMID: 33477866 Free PMC article.
-
DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues.Comput Biol Med. 2024 Mar;170:108081. doi: 10.1016/j.compbiomed.2024.108081. Epub 2024 Jan 29. Comput Biol Med. 2024. PMID: 38295475 Free PMC article.
-
A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers.Genes (Basel). 2018 Aug 1;9(8):394. doi: 10.3390/genes9080394. Genes (Basel). 2018. PMID: 30071697 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases