Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;33(1):e4861.
doi: 10.1002/pro.4861.

Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset

Affiliations

Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset

Feifan Zheng et al. Protein Sci. 2024 Jan.

Abstract

Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.

Keywords: computational tools; missense mutations; protein stability changes; stabilizing mutations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

FIGURE 1
FIGURE 1
Overview of the S4038 dataset. (a) The distribution of experimental stability changes. (b) The number of mutations from each database. (c) The number of mutations for each property.
FIGURE 2
FIGURE 2
Performance of 27 methods evaluated on unseen data. Machine learning methods were assessed using mutations from the combined dataset of S4038[0%–25%] and S4038[25%–100%]. Non‐machine learning methods were analyzed using all mutations within S4038. The color key denotes different categories: green for structure‐based machine learning methods, blue for sequence‐based machine learning methods, red for structure‐based non‐machine learning methods, and purple for sequence‐based non‐machine learning methods.
FIGURE 3
FIGURE 3
The classification accuracy of different methods for discerning destabilizing and stabilizing mutations. (a) True positive rates were calculated for six categories of destabilizing and stabilizing mutations. Machine learning methods were assessed using mutations from the combined dataset of S4038[0%–25%] and S4038[25%–100%]. Non‐machine learning methods were analyzed using all mutations within S4038. (b) The definition of six categories of destabilizing and stabilizing mutations. Data 4 presents the counts of true positive and positive mutations for each category.
FIGURE 4
FIGURE 4
An overview of the performance of various methods across five distinct property types. (a) Pearson correlation coefficients between experimental and predicted ∆∆G values for all properties. (b) True positive rates for all properties with respect to L.D. (c) True positive rates for all properties with respect to L.S. Machine learning methods were evaluated using mutations from the combined dataset of S4038[0%–25%] and S4038[25%–100%], while non‐machine learning methods were analyzed using all mutations within S4038. Data 5 provides the counts of mutations for each property along with the numbers of true positive and positive mutations.

Similar articles

Cited by

References

    1. Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: toward simple, balanced, and interpretable models. J Comput Chem. 2022;43:504–518. - PubMed
    1. Ben Chorin A, Masrati G, Kessel A, Narunsky A, Sprinzak J, Lahav S, et al. ConSurf‐DB: an accessible repository for the evolutionary conservation patterns of the majority of PDB proteins. Protein Sci. 2020;29:258–267. - PMC - PubMed
    1. Benevenuta S, Birolo G, Sanavia T, Capriotti E, Fariselli P. Challenges in predicting stabilizing variations: an exploration. Front Mol Biosci. 2022;9:1075570. - PMC - PubMed
    1. Benevenuta S, Fariselli P. On the upper bounds of the real‐valued predictions. Bioinform Biol Insights. 2019;13:1177932219871263. - PMC - PubMed
    1. Benevenuta S, Pancotti C, Fariselli P, Birolo G, Sanavia T. An antisymmetric neural network to predict free energy changes in protein variants. J Phys D Appl Phys. 2021;54:245403.

LinkOut - more resources