Accurate prediction of enzyme mutant activity based on a multibody statistical potential
- PMID: 17977887
- DOI: 10.1093/bioinformatics/btm509
Accurate prediction of enzyme mutant activity based on a multibody statistical potential
Abstract
Motivation: An important area of research in biochemistry and molecular biology focuses on characterization of enzyme mutants. However, synthesis and analysis of experimental mutants is time consuming and expensive. We describe a machine-learning approach for inferring the activity levels of all unexplored single point mutants of an enzyme, based on a training set of such mutants with experimentally measured activity.
Results: Based on a Delaunay tessellation-derived four-body statistical potential function, a perturbation vector measuring environmental changes relative to wild type (wt) at every residue position uniquely characterizes each enzyme mutant for model development and prediction. First, a measure of model performance utilizing area (AUC) under the receiver operating characteristic (ROC) curve surpasses 0.83 and 0.77 for data sets of experimental HIV-1 protease and T4 lysozyme mutants, respectively. Additionally, a novel method is introduced for evaluating statistical significance associated with the number of correct test set predictions obtained from a trained model. Third, 100 stratified random splits of the protease and T4 lysozyme mutant data sets into training and test sets achieve 77.0% and 80.8% mean accuracy, respectively. Next, protease and T4 lysozyme models trained with experimental mutants are used to predict activity levels for all remaining mutants; a subsequent search for publications reporting on dozens of these test mutants reveals that experimental results are matched by 79% and 86% of predictions, respectively. Finally, learning curves for each mutant enzyme system indicate the influence of training set size on model performance.
Availability: Prediction databases at http://proteins.gmu.edu/automute/
Similar articles
-
Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis.Bioinformatics. 2008 Sep 15;24(18):2002-9. doi: 10.1093/bioinformatics/btn353. Epub 2008 Jul 16. Bioinformatics. 2008. PMID: 18632749
-
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838. Proteins. 2008. PMID: 18186470
-
Modeling protein loops with knowledge-based prediction of sequence-structure alignment.Bioinformatics. 2007 Nov 1;23(21):2836-42. doi: 10.1093/bioinformatics/btm456. Epub 2007 Sep 7. Bioinformatics. 2007. PMID: 17827204
-
Using product kernels to predict protein interactions.Adv Biochem Eng Biotechnol. 2008;110:215-45. doi: 10.1007/10_2007_084. Adv Biochem Eng Biotechnol. 2008. PMID: 17922100 Review.
-
Enzyme function prediction with interpretable models.Methods Mol Biol. 2009;541:373-420. doi: 10.1007/978-1-59745-243-4_17. Methods Mol Biol. 2009. PMID: 19381539 Review.
Cited by
-
All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds.Biomed Res Int. 2017;2017:5760612. doi: 10.1155/2017/5760612. Epub 2017 Oct 8. Biomed Res Int. 2017. PMID: 29119109 Free PMC article.
-
Modeling functional changes to Escherichia coli thymidylate synthase upon single residue replacements: a structure-based approach.PeerJ. 2015 Jan 8;3:e721. doi: 10.7717/peerj.721. eCollection 2015. PeerJ. 2015. PMID: 25648456 Free PMC article.
-
Neighborhood properties are important determinants of temperature sensitive mutations.PLoS One. 2011;6(12):e28507. doi: 10.1371/journal.pone.0028507. Epub 2011 Dec 2. PLoS One. 2011. PMID: 22164302 Free PMC article.
-
Sequence and structure based models of HIV-1 protease and reverse transcriptase drug resistance.BMC Genomics. 2013;14 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2164-14-S4-S3. Epub 2013 Oct 1. BMC Genomics. 2013. PMID: 24268064 Free PMC article.
-
PROTS: a fragment based protein thermo-stability potential.Proteins. 2012 Jan;80(1):81-92. doi: 10.1002/prot.23163. Epub 2011 Oct 5. Proteins. 2012. PMID: 21976375 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources