Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset

doi:10.1002/pro.4861

. 2024 Jan;33(1):e4861.

doi: 10.1002/pro.4861.

Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset

Feifan Zheng¹, Yang Liu¹, Yan Yang¹, Yuhao Wen¹, Minghui Li¹

Affiliations

Affiliation

¹ MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, China.

PMID: 38084013
PMCID: PMC10751734
DOI: 10.1002/pro.4861

Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset

Feifan Zheng et al. Protein Sci. 2024 Jan.

. 2024 Jan;33(1):e4861.

doi: 10.1002/pro.4861.

Authors

Feifan Zheng¹, Yang Liu¹, Yan Yang¹, Yuhao Wen¹, Minghui Li¹

Affiliation

¹ MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, China.

PMID: 38084013
PMCID: PMC10751734
DOI: 10.1002/pro.4861

Abstract

Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.

Keywords: computational tools; missense mutations; protein stability changes; stabilizing mutations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**FIGURE 1**
Overview of the S4038 dataset. (a) The distribution of experimental stability changes. (b) The number of mutations from each database. (c) The number of mutations for each property.

**FIGURE 2**
Performance of 27 methods evaluated on unseen data. Machine learning methods were assessed using mutations from the combined dataset of S4038^[0%–25%] and S4038^[25%–100%]. Non‐machine learning methods were analyzed using all mutations within S4038. The color key denotes different categories: green for structure‐based machine learning methods, blue for sequence‐based machine learning methods, red for structure‐based non‐machine learning methods, and purple for sequence‐based non‐machine learning methods.

**FIGURE 3**
The classification accuracy of different methods for discerning destabilizing and stabilizing mutations. (a) True positive rates were calculated for six categories of destabilizing and stabilizing mutations. Machine learning methods were assessed using mutations from the combined dataset of S4038^[0%–25%] and S4038^[25%–100%]. Non‐machine learning methods were analyzed using all mutations within S4038. (b) The definition of six categories of destabilizing and stabilizing mutations. Data 4 presents the counts of true positive and positive mutations for each category.

**FIGURE 4**
An overview of the performance of various methods across five distinct property types. (a) Pearson correlation coefficients between experimental and predicted ∆∆G values for all properties. (b) True positive rates for all properties with respect to L.D. (c) True positive rates for all properties with respect to L.S. Machine learning methods were evaluated using mutations from the combined dataset of S4038^[0%–25%] and S4038^[25%–100%], while non‐machine learning methods were analyzed using all mutations within S4038. Data 5 provides the counts of mutations for each property along with the numbers of true positive and positive mutations.

See this image and copyright information in PMC

Cited by

Exploring Evolution to Uncover Insights Into Protein Mutational Stability.
Hermans P, Tsishyn M, Schwersensky M, Rooman M, Pucci F. Hermans P, et al. Mol Biol Evol. 2025 Jan 6;42(1):msae267. doi: 10.1093/molbev/msae267. Mol Biol Evol. 2025. PMID: 39786559 Free PMC article.
Leveraging computer-aided design and artificial intelligence to develop a next-generation multi-epitope tuberculosis vaccine candidate.
Zhuang L, Ali A, Yang L, Ye Z, Li L, Ni R, An Y, Ali SL, Gong W. Zhuang L, et al. Infect Med (Beijing). 2024 Nov 9;3(4):100148. doi: 10.1016/j.imj.2024.100148. eCollection 2024 Dec. Infect Med (Beijing). 2024. PMID: 39687693 Free PMC article.
Revolutionizing Molecular Design for Innovative Therapeutic Applications through Artificial Intelligence.
Son A, Park J, Kim W, Yoon Y, Lee S, Park Y, Kim H. Son A, et al. Molecules. 2024 Sep 29;29(19):4626. doi: 10.3390/molecules29194626. Molecules. 2024. PMID: 39407556 Free PMC article. Review.
Transfer Learning in Cancer Genetics, Mutation Detection, Gene Expression Analysis, and Syndrome Recognition.
Ashayeri H, Sobhi N, Pławiak P, Pedrammehr S, Alizadehsani R, Jafarizadeh A. Ashayeri H, et al. Cancers (Basel). 2024 Jun 4;16(11):2138. doi: 10.3390/cancers16112138. Cancers (Basel). 2024. PMID: 38893257 Free PMC article. Review.
The origin of mutational epistasis.
Vila JA. Vila JA. Eur Biophys J. 2024 Nov;53(7-8):473-480. doi: 10.1007/s00249-024-01725-9. Epub 2024 Oct 23. Eur Biophys J. 2024. PMID: 39443382

See all "Cited by" articles

References

1. Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: toward simple, balanced, and interpretable models. J Comput Chem. 2022;43:504–518. - PubMed
1. Ben Chorin A, Masrati G, Kessel A, Narunsky A, Sprinzak J, Lahav S, et al. ConSurf‐DB: an accessible repository for the evolutionary conservation patterns of the majority of PDB proteins. Protein Sci. 2020;29:258–267. - PMC - PubMed
1. Benevenuta S, Birolo G, Sanavia T, Capriotti E, Fariselli P. Challenges in predicting stabilizing variations: an exploration. Front Mol Biosci. 2022;9:1075570. - PMC - PubMed
1. Benevenuta S, Fariselli P. On the upper bounds of the real‐valued predictions. Bioinform Biol Insights. 2019;13:1177932219871263. - PMC - PubMed
1. Benevenuta S, Pancotti C, Fariselli P, Birolo G, Sanavia T. An antisymmetric neural network to predict free energy changes in protein variants. J Phys D Appl Phys. 2021;54:245403.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: toward simple, balanced, and interpretable models. J Comput Chem. 2022;43:504–518. - PubMed

[2] Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: toward simple, balanced, and interpretable models. J Comput Chem. 2022;43:504–518. - PubMed

[3] Ben Chorin A, Masrati G, Kessel A, Narunsky A, Sprinzak J, Lahav S, et al. ConSurf‐DB: an accessible repository for the evolutionary conservation patterns of the majority of PDB proteins. Protein Sci. 2020;29:258–267. - PMC - PubMed

[4] Ben Chorin A, Masrati G, Kessel A, Narunsky A, Sprinzak J, Lahav S, et al. ConSurf‐DB: an accessible repository for the evolutionary conservation patterns of the majority of PDB proteins. Protein Sci. 2020;29:258–267. - PMC - PubMed

[5] Benevenuta S, Birolo G, Sanavia T, Capriotti E, Fariselli P. Challenges in predicting stabilizing variations: an exploration. Front Mol Biosci. 2022;9:1075570. - PMC - PubMed

[6] Benevenuta S, Birolo G, Sanavia T, Capriotti E, Fariselli P. Challenges in predicting stabilizing variations: an exploration. Front Mol Biosci. 2022;9:1075570. - PMC - PubMed

[7] Benevenuta S, Fariselli P. On the upper bounds of the real‐valued predictions. Bioinform Biol Insights. 2019;13:1177932219871263. - PMC - PubMed

[8] Benevenuta S, Fariselli P. On the upper bounds of the real‐valued predictions. Bioinform Biol Insights. 2019;13:1177932219871263. - PMC - PubMed

[9] Benevenuta S, Pancotti C, Fariselli P, Birolo G, Sanavia T. An antisymmetric neural network to predict free energy changes in protein variants. J Phys D Appl Phys. 2021;54:245403.

[10] Benevenuta S, Pancotti C, Fariselli P, Birolo G, Sanavia T. An antisymmetric neural network to predict free energy changes in protein variants. J Phys D Appl Phys. 2021;54:245403.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset

Affiliation

Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources