Combing ontologies and dipeptide composition for predicting DNA-binding proteins
- PMID: 18175049
- DOI: 10.1007/s00726-007-0016-3
Combing ontologies and dipeptide composition for predicting DNA-binding proteins
Abstract
Given a novel protein it is very important to know if it is a DNA-binding protein, because DNA-binding proteins participate in the fundamental role to regulate gene expression. In this work, we propose a parallel fusion between a classifier trained using the features extracted from the gene ontology database and a classifier trained using the dipeptide composition of the protein. As classifiers the support vector machine (SVM) and the 1-nearest neighbour are used. Matthews's correlation coefficient obtained by our fusion method is approximately 0.97 when the jackknife cross-validation is used; this result outperforms the best performance obtained in the literature (0.924) using the same dataset where the SVM is trained using only the Chou's pseudo amino acid based features. In this work also the area under the ROC-curve (AUC) is reported and our results show that the fusion permits to obtain a very interesting 0.995 AUC. In particular we want to stress that our fusion obtains a 5% false negative with a 0% of false positive. Matthews's correlation coefficient obtained using the single best GO-number is only 0.7211 and hence it is not possible to use the gene ontology database as a simple lookup table. Finally, we test the complementarity of the two tested feature extraction methods using the Q-statistic. We obtain the very interesting result of 0.58, which means that the features extracted from the gene ontology database and the features extracted from the amino acid sequence are partially independent and that their parallel fusion should be studied more.
Similar articles
-
An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins.Amino Acids. 2009 Feb;36(2):167-75. doi: 10.1007/s00726-008-0044-7. Epub 2008 Feb 21. Amino Acids. 2009. PMID: 18288459
-
Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization.Amino Acids. 2008 May;34(4):653-60. doi: 10.1007/s00726-007-0018-1. Epub 2008 Jan 4. Amino Acids. 2008. PMID: 18175047
-
Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein.BMC Bioinformatics. 2005 Mar 17;6:59. doi: 10.1186/1471-2105-6-59. BMC Bioinformatics. 2005. PMID: 15773999 Free PMC article.
-
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.In Silico Biol. 2008;8(2):129-40. In Silico Biol. 2008. PMID: 18928201
-
Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion.Amino Acids. 2006 Jun;30(4):461-8. doi: 10.1007/s00726-006-0263-8. Epub 2006 May 15. Amino Acids. 2006. PMID: 16773245
Cited by
-
Prediction of RNA- and DNA-Binding Proteins Using Various Machine Learning Classifiers.Avicenna J Med Biotechnol. 2019 Jan-Mar;11(1):104-111. Avicenna J Med Biotechnol. 2019. PMID: 30800250 Free PMC article.
-
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method.Biomed Res Int. 2020 Apr 13;2020:7297631. doi: 10.1155/2020/7297631. eCollection 2020. Biomed Res Int. 2020. PMID: 32352006 Free PMC article.
-
Improved detection of DNA-binding proteins via compression technology on PSSM information.PLoS One. 2017 Sep 29;12(9):e0185587. doi: 10.1371/journal.pone.0185587. eCollection 2017. PLoS One. 2017. PMID: 28961273 Free PMC article.
-
DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features.Appl Bionics Biomech. 2022 Apr 13;2022:5483115. doi: 10.1155/2022/5483115. eCollection 2022. Appl Bionics Biomech. 2022. PMID: 35465187 Free PMC article.
-
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014. PLoS One. 2014. PMID: 24475169 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials