Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 28:11:62.
doi: 10.1186/1471-2105-11-62.

A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants

Affiliations

A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants

Yunqi Li et al. BMC Bioinformatics. .

Abstract

Background: The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants.

Results: We report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins.

Conclusions: We have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at http://www.abl.ku.edu/thermorank/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The pariwise comparisons of amino acid compositions in the three different sets of proteins. The solid lines show the best-fit of linear regression lines with regression coefficient and slope displayed and the dash lines show the orthogonal line. Bac_M, arc_H and bac_H are proteins from mesophilic bacteria, hyperthermophilic archaea and bacteria, respectively.
Figure 2
Figure 2
Amino acid substitutions between mesophilic and hyperthermophilic proteins. The top number in each cell is the observed substitution instances and the bottom one (in italics) is the ratio of the number of the substitution cases to the opposite substitution. Significant biased substitutions (p-value < 10-10, two-sided Fisher's exact test) are highlighted in bold. Red cells are significant HP favored substitutions while blues are MP favored.
Figure 3
Figure 3
The 25 most important features ranked by the Gini importance of the random forest algorithm. The prefixes c_ and x_ of each feature indicate that the feature is an absolute count or normalized value, respectively.
Figure 4
Figure 4
The cumulative curves of the 10 most important features against the relative difference between hyperthermophilic and mesophilic sequences.

Similar articles

Cited by

References

    1. Sterner R, Liebl W. Thermophilic adaptation of proteins. Critical Reviews in Biochemistry and Molecular Biology. 2001;36:39–106. doi: 10.1080/20014091074174. - DOI - PubMed
    1. Dahiyat BI. In silico design for protein stabilization. Current Opinion in Biotechnology. 1999;10:387–390. doi: 10.1016/S0958-1669(99)80070-6. - DOI - PubMed
    1. Korkegian A, Black ME, Baker D, Stoddard BL. Computational thermostabilization of an enzyme. Science. 2005;308:857–860. doi: 10.1126/science.1107387. - DOI - PMC - PubMed
    1. Lazar GA, Marshall SA, Plecs JJ, Mayo SL, Desjarlais JR. Designing proteins for therapeutic applications. Curr Opin Struct Biol. 2003;13:513–518. doi: 10.1016/S0959-440X(03)00104-0. - DOI - PubMed
    1. Schweiker KL, Makhatadze GI. A Computational Approach for the Rational Design of Stable Proteins and Enzymes: Optimization of Surface Charge-Charge Interactions. Methods in Enzymology: Computer Methods. 2009;454(Pt A):175–211. full_text. - PubMed

Publication types

Substances