Challenges in predicting stabilizing variations: An exploration

Silvia Benevenuta¹, Giovanni Birolo¹, Tiziana Sanavia¹, Emidio Capriotti², Piero Fariselli¹

Affiliations

¹ Department of Medical Sciences, University of Torino, Torino, Italy.
² Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy.

PMID: 36685278
PMCID: PMC9849384
DOI: 10.3389/fmolb.2022.1075570

Challenges in predicting stabilizing variations: An exploration

Silvia Benevenuta et al. Front Mol Biosci. 2023.

. 2023 Jan 5:9:1075570.

doi: 10.3389/fmolb.2022.1075570. eCollection 2022.

Authors

Silvia Benevenuta¹, Giovanni Birolo¹, Tiziana Sanavia¹, Emidio Capriotti², Piero Fariselli¹

Affiliations

¹ Department of Medical Sciences, University of Torino, Torino, Italy.
² Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy.

PMID: 36685278
PMCID: PMC9849384
DOI: 10.3389/fmolb.2022.1075570

Abstract

An open challenge of computational and experimental biology is understanding the impact of non-synonymous DNA variations on protein function and, subsequently, human health. The effects of these variants on protein stability can be measured as the difference in the free energy of unfolding (ΔΔG) between the mutated structure of the protein and its wild-type form. Throughout the years, bioinformaticians have developed a wide variety of tools and approaches to predict the ΔΔG. Although the performance of these tools is highly variable, overall they are less accurate in predicting ΔΔG stabilizing variations rather than the destabilizing ones. Here, we analyze the possible reasons for this difference by focusing on the relationship between experimentally-measured ΔΔG and seven protein properties on three widely-used datasets (S2648, VariBench, Ssym) and a recently introduced one (S669). These properties include protein structural information, different physical properties and statistical potentials. We found that two highly used input features, i.e., hydrophobicity and the Blosum62 substitution matrix, show a performance close to random choice when trying to separate stabilizing variants from either neutral or destabilizing ones. We then speculate that, since destabilizing variations are the most abundant class in the available datasets, the overall performance of the methods is higher when including features that improve the prediction for the destabilizing variants at the expense of the stabilizing ones. These findings highlight the need of designing predictive methods able to exploit also input features highly correlated with the stabilizing variants. New tools should also be tested on a not-artificially balanced dataset, reporting the performance on all the three classes (i.e., stabilizing, neutral and destabilizing variants) and not only the overall results.

Keywords: machine learning; protein stability; single-point mutation; stability predictors; stabilizing variants.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Venn diagrams showing the number of shared variants among the Ssym, VariBench, S2648 and S669 datasets.

**FIGURE 2**
Distribution of the experimental ΔΔG values in the Ssym, S669, S2648 and VariBench datasets.

**FIGURE 3**
**Distributions of the features**. Boxplots showing the distributions of the features on the three classes. The variations are considered neutral if ΔΔG ∈[−0.5, 0.5], stabilizing if ΔΔG <−0.5 and destabilizing if ΔΔG > 0.5. For each pair of classes we computed the Mann-Whitney-Wilcoxon test two-sided to establish the difference in the distributions. The p-values are reported here in a compact way: “ns” - p > 0.05, * - 0.01 < p ≤ 0.05, ** - 1.0e−03 < p ≤ 0.01, *** - 1.0e−04 < p ≤ 1.0e−03, **** - p ≤ 1.0e−04, the actual values of the p-values are in Tab.4.

See this image and copyright information in PMC

References

1. Bastolla U., Farwer J., Knapp E. W., Vendruscolo M. (2001). How to guarantee optimal stability for most representative structures in the protein data bank. Proteins Struct. Funct. Bioinforma. 44, 79–96. 10.1002/prot.1075 - DOI - PubMed
1. Benevenuta S., Fariselli P. (2019). On the upper bounds of the real-valued predictions. Bioinform Biol. Insights 13, 1177932219871263. 10.1177/1177932219871263 - DOI - PMC - PubMed
1. Benevenuta S., Pancotti C., Fariselli P., Birolo G., Sanavia T. (2021). An antisymmetric neural network to predict free energy changes in protein variants. J. Phys. D Appl. Phys. 54, 245403. 10.1088/1361-6463/abedfb - DOI
1. Birolo G., Benevenuta S., Fariselli P., Capriotti E., Giorgio E., Sanavia T. (2021). Protein stability perturbation contributes to the loss of function in haploinsufficient genes. Front. Mol. Biosci. 8, 620793. 10.3389/fmolb.2021.620793 - DOI - PMC - PubMed
1. Capriotti E., Fariselli P., Casadio R. (2005). I-mutant2. 0: Predicting stability changes upon mutation from the protein sequence or structure. Nucleic acids Res. 33, W306–W310. 10.1093/nar/gki375 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Challenges in predicting stabilizing variations: An exploration

Affiliations

Challenges in predicting stabilizing variations: An exploration

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources