Enzyme function prediction with interpretable models
- PMID: 19381539
- DOI: 10.1007/978-1-59745-243-4_17
Enzyme function prediction with interpretable models
Abstract
Enzymes play central roles in metabolic pathways, and the prediction of metabolic pathways in newly sequenced genomes usually starts with the assignment of genes to enzymatic reactions. However, genes with similar catalytic activity are not necessarily similar in sequence, and therefore the traditional sequence similarity-based approach often fails to identify the relevant enzymes, thus hindering efforts to map the metabolome of an organism.Here we study the direct relationship between basic protein properties and their function. Our goal is to develop a new tool for functional prediction (e.g., prediction of Enzyme Commission number), which can be used to complement and support other techniques based on sequence or structure information. In order to define this mapping we collected a set of 453 features and properties that characterize proteins and are believed to be related to structural and functional aspects of proteins. We introduce a mixture model of stochastic decision trees to learn the set of potentially complex relationships between features and function. To study these correlations, trees are created and tested on the Pfam classification of proteins, which is based on sequence, and the EC classification, which is based on enzymatic function. The model is very effective in learning highly diverged protein families or families that are not defined on the basis of sequence. The resulting tree structures highlight the properties that are strongly correlated with structural and functional aspects of protein families, and can be used to suggest a concise definition of a protein family.
Similar articles
-
Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach.Nucleic Acids Res. 2004 Dec 7;32(21):6437-44. doi: 10.1093/nar/gkh984. Print 2004. Nucleic Acids Res. 2004. PMID: 15585667 Free PMC article.
-
Enzyme family classification by support vector machines.Proteins. 2004 Apr 1;55(1):66-76. doi: 10.1002/prot.20045. Proteins. 2004. PMID: 14997540
-
PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics.BMC Bioinformatics. 2006 Feb 6;7:53. doi: 10.1186/1471-2105-7-53. BMC Bioinformatics. 2006. PMID: 16460560 Free PMC article.
-
The utility of structure-activity relationship (SAR) models for prediction and covariate selection in developmental toxicity: comparative analysis of logistic regression and decision tree models.SAR QSAR Environ Res. 2004 Feb;15(1):1-18. doi: 10.1080/1062936032000169633. SAR QSAR Environ Res. 2004. PMID: 15113065 Review.
-
A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods.Curr Drug Targets. 2019;20(5):540-550. doi: 10.2174/1389450119666181002143355. Curr Drug Targets. 2019. PMID: 30277150 Review.
Cited by
-
Computational Approaches for Automated Classification of Enzyme Sequences.J Proteomics Bioinform. 2011 Aug 23;4:147-152. doi: 10.4172/jpb.1000183. J Proteomics Bioinform. 2011. PMID: 22114367 Free PMC article.
-
Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism.BMC Genomics. 2015;16 Suppl 7(Suppl 7):S16. doi: 10.1186/1471-2164-16-S7-S16. Epub 2015 Jun 11. BMC Genomics. 2015. PMID: 26099921 Free PMC article.
-
SHARP: genome-scale identification of gene-protein-reaction associations in cyanobacteria.Photosynth Res. 2013 Nov;118(1-2):181-90. doi: 10.1007/s11120-013-9910-6. Epub 2013 Aug 24. Photosynth Res. 2013. PMID: 23975204
-
EFICAz2: enzyme function inference by a combined approach enhanced by machine learning.BMC Bioinformatics. 2009 Apr 13;10:107. doi: 10.1186/1471-2105-10-107. BMC Bioinformatics. 2009. PMID: 19361344 Free PMC article.
-
Cell development obeys maximum Fisher information.Front Biosci (Elite Ed). 2013 Jun 1;5(3):1017-32. doi: 10.2741/e681. Front Biosci (Elite Ed). 2013. PMID: 23747917 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources