Accurate prediction of enzyme mutant activity based on a multibody statistical potential

Majid Masso¹, Iosif I Vaisman

Affiliations

Affiliation

¹ Laboratory for Structural Bioinformatics, School of Computational Sciences, George Mason University, 10900 University Boulevard, MSN 5B3, Manassas, VA 20110, USA.

PMID: 17977887
DOI: 10.1093/bioinformatics/btm509

Accurate prediction of enzyme mutant activity based on a multibody statistical potential

Majid Masso et al. Bioinformatics. 2007.

. 2007 Dec 1;23(23):3155-61.

doi: 10.1093/bioinformatics/btm509. Epub 2007 Oct 31.

Authors

Majid Masso¹, Iosif I Vaisman

Affiliation

¹ Laboratory for Structural Bioinformatics, School of Computational Sciences, George Mason University, 10900 University Boulevard, MSN 5B3, Manassas, VA 20110, USA.

PMID: 17977887
DOI: 10.1093/bioinformatics/btm509

Abstract

Motivation: An important area of research in biochemistry and molecular biology focuses on characterization of enzyme mutants. However, synthesis and analysis of experimental mutants is time consuming and expensive. We describe a machine-learning approach for inferring the activity levels of all unexplored single point mutants of an enzyme, based on a training set of such mutants with experimentally measured activity.

Results: Based on a Delaunay tessellation-derived four-body statistical potential function, a perturbation vector measuring environmental changes relative to wild type (wt) at every residue position uniquely characterizes each enzyme mutant for model development and prediction. First, a measure of model performance utilizing area (AUC) under the receiver operating characteristic (ROC) curve surpasses 0.83 and 0.77 for data sets of experimental HIV-1 protease and T4 lysozyme mutants, respectively. Additionally, a novel method is introduced for evaluating statistical significance associated with the number of correct test set predictions obtained from a trained model. Third, 100 stratified random splits of the protease and T4 lysozyme mutant data sets into training and test sets achieve 77.0% and 80.8% mean accuracy, respectively. Next, protease and T4 lysozyme models trained with experimental mutants are used to predict activity levels for all remaining mutants; a subsequent search for publications reporting on dozens of these test mutants reveals that experimental results are matched by 79% and 86% of predictions, respectively. Finally, learning curves for each mutant enzyme system indicate the influence of training set size on model performance.

Availability: Prediction databases at http://proteins.gmu.edu/automute/

PubMed Disclaimer

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accurate prediction of enzyme mutant activity based on a multibody statistical potential

Affiliation

Accurate prediction of enzyme mutant activity based on a multibody statistical potential

Authors

Affiliation

Abstract

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources