Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jul 1;24(13):i190-5.
doi: 10.1093/bioinformatics/btn166.

Predicting protein thermostability changes from sequence upon multiple mutations

Affiliations

Predicting protein thermostability changes from sequence upon multiple mutations

Ludovica Montanucci et al. Bioinformatics. .

Abstract

Motivation: A basic question in protein science is to which extent mutations affect protein thermostability. This knowledge would be particularly relevant for engineering thermostable enzymes. In several experimental approaches, this issue has been serendipitously addressed. It would be therefore convenient providing a computational method that predicts when a given protein mutant is more thermostable than its corresponding wild-type.

Results: We present a new method based on support vector machines that is able to predict whether a set of mutations (including insertion and deletions) can enhance the thermostability of a given protein sequence. When trained and tested on a redundancy-reduced dataset, our predictor achieves 88% accuracy and a correlation coefficient equal to 0.75. Our predictor also correctly classifies 12 out of 14 experimentally characterized protein mutants with enhanced thermostability. Finally, it correctly detects all the 11 mutated proteins whose increase in stability temperature is >10 degrees C.

Availability: The dataset and the list of protein clusters adopted for the SVM cross-validation are available at the web site http://lipid.biocomp.unibo.it/~ludovica/thermo-meso-MUT.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
ROC curve of the three predictors. Solid gray line: L20 SVM predictor; dotted black line: L400 SVM predictor; solid black line: combined predictor.
Fig. 2.
Fig. 2.
The accuracy of the combined SVM method is plotted with respect to the sequence identity, grouped into bins of identity, in the pair. Bars indicate the frequency of pairs in the training set with a given identity value.
Fig. 3.
Fig. 3.
The accuracy of the combined SVM method is plotted with respect to the protein length in the pair. For each pair the maximum protein length was chosen. Bars indicate the frequency of pairs in the training set with a given protein length.
Fig. 4.
Fig. 4.
The accuracy of the combined SVM method is plotted with respect to the reliability index. Bars represent the fraction of the database with a given value of reliability index.
Fig. 5.
Fig. 5.
The values of the components of the hyperplane vector of SVM L20 are plotted as bars. The average compositional differences obtained by averaging all the training examples are plotted as dots connected by a line.

Similar articles

Cited by

References

    1. Annaluru N, et al. Thermostabilization of Pichia stipitis xylitol dehydrogenase by mutation of structural zinc-binding loop. J. Biotechnol. 2007;129:717–722. - PubMed
    1. Bakke M, et al. Thermostabilization of porcine kidney D-amino acid oxidase by a single amino acid substitution. Biotechnol. Bioeng., 2006;93:1023–1027. - PubMed
    1. Bommarius AS, et al. High-throughput screening for enhanced protein stability. Curr. Opin. Biotechnol. 2006;17:606–610. - PubMed
    1. Brouns SJ, et al. Engineering a selectable marker for hyperthermophiles. J. Biol. Chem., 2005;280:11422–11431. - PubMed
    1. Burges CJC. A Tutorial on Support Vector Machines for Pattern Recognition. Boston: Kluwer Academic Publishers; 1998.

Publication types