Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 9;22(2):606.
doi: 10.3390/ijms22020606.

SAAFEC-SEQ: A Sequence-Based Method for Predicting the Effect of Single Point Mutations on Protein Thermodynamic Stability

Affiliations

SAAFEC-SEQ: A Sequence-Based Method for Predicting the Effect of Single Point Mutations on Protein Thermodynamic Stability

Gen Li et al. Int J Mol Sci. .

Abstract

Modeling the effect of mutations on protein thermodynamics stability is useful for protein engineering and understanding molecular mechanisms of disease-causing variants. Here, we report a new development of the SAAFEC method, the SAAFEC-SEQ, which is a gradient boosting decision tree machine learning method to predict the change of the folding free energy caused by amino acid substitutions. The method does not require the 3D structure of the corresponding protein, but only its sequence and, thus, can be applied on genome-scale investigations where structural information is very sparse. SAAFEC-SEQ uses physicochemical properties, sequence features, and evolutionary information features to make the predictions. It is shown to consistently outperform all existing state-of-the-art sequence-based methods in both the Pearson correlation coefficient and root-mean-squared-error parameters as benchmarked on several independent datasets. The SAAFEC-SEQ has been implemented into a web server and is available as stand-alone code that can be downloaded and embedded into other researchers' code.

Keywords: machine learning; sequence-based; single point mutation; thermodynamics stability; web server.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
SAAFEC-SEQ predicted ΔΔG against experimental ΔΔG in case of 20% of mutations as a test set.
Figure 2
Figure 2
Importance level of each feature selected for SAAFEC-SEQ.
Figure 3
Figure 3
Performance comparison of SAAFEC-SEQ with other existing sequence-based methods on PTEN and TPMT datasets.

Similar articles

Cited by

References

    1. Ofoegbu T.C., David A., Kelley L.A., Mezulis S., Islam S.A., Mersmann S.F., Stromich L., Vakser I.A., Houlston R.S., Sternberg M.J.E. PhyreRisk: A Dynamic Web Application to Bridge Genomics, Proteomics and 3D Structural Data to Guide Interpretation of Human Genetic Variants. J. Mol. Biol. 2019;431:2460–2466. doi: 10.1016/j.jmb.2019.04.043. - DOI - PMC - PubMed
    1. Ittisoponpisan S., Islam S.A., Khanna T., Alhuzimi E., David A., Sternberg M.J.E. Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated? J. Mol. Biol. 2019;431:2197–2212. doi: 10.1016/j.jmb.2019.04.009. - DOI - PMC - PubMed
    1. Magliery T.J., Lavinder J.J., Sullivan B.J. Protein stability by number: High-throughput and statistical approaches to one of protein science’s most difficult problems. Curr. Opin. Chem. Biol. 2011;15:443–451. doi: 10.1016/j.cbpa.2011.03.015. - DOI - PMC - PubMed
    1. Stein A., Fowler D.M., Hartmann-Petersen R., Lindorff-Larsen K. Biophysical and Mechanistic Models for Disease-Causing Protein Variants. Trends Biochem. Sci. 2019;44:575–588. doi: 10.1016/j.tibs.2019.01.003. - DOI - PMC - PubMed
    1. Petukh M., Kucukkal T.G., Alexov E. On human disease-causing amino acid variants: Statistical study of sequence and structural patterns. Hum. Mutat. 2015;36:524–534. doi: 10.1002/humu.22770. - DOI - PMC - PubMed

LinkOut - more resources