Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jul;44(7):575-588.
doi: 10.1016/j.tibs.2019.01.003. Epub 2019 Jan 31.

Biophysical and Mechanistic Models for Disease-Causing Protein Variants

Affiliations
Review

Biophysical and Mechanistic Models for Disease-Causing Protein Variants

Amelie Stein et al. Trends Biochem Sci. 2019 Jul.

Abstract

The rapid decrease in DNA sequencing cost is revolutionizing medicine and science. In medicine, genome sequencing has revealed millions of missense variants that change protein sequences, yet we only understand the molecular and phenotypic consequences of a small fraction. Within protein science, high-throughput deep mutational scanning experiments enable us to probe thousands of variants in a single, multiplexed experiment. We review efforts that bring together these topics via experimental and computational approaches to determine the consequences of missense variants in proteins. We focus on the role of changes in protein stability as a driver for disease, and how experiments, biophysical models, and computation are providing a framework for understanding and predicting how changes in protein sequence affect cellular protein stability.

Keywords: computational biophysics; deep mutational scanning; genomics; protein quality control; protein stability; variant classification.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Using high-throughput experiments and computational methods to classify protein variants. (a) Genomic variation across the human population gives rise to phenotypic variation, some of which is caused by ‘protein variants’ in which the encoded proteins differ in sequence at a single amino acid position. A key problem is to determine whether such variation has little biological consequence (‘benign’) or increases the risk for a certain disease (‘pathogenic’). (b) High-throughput experiments such as deep mutational scanning (DMS) is one strategy to probe the effect of virtually all possible missense variants in a single experiment, and may be summarised by a heat-map that shows whether the variant causes severe loss of function or another property (red), or whether the variant protein behaves similar to the wild type (blue). (c) Alternatively, or as a supplement to experiments, computational methods can be used to predict whether the variant is likely to perturb for example, activity, stability or other properties important for function. Such prediction methods may for example be based on sequence conservation through evolution, specific biophysical models, or be trained through machine learning to capture experimental observations. (d) The experimental data or computational results are then used, sometimes in concert, to help predict phenotypic consequences of genomic variation of use for example in patient diagnosis, gene discovery, or to provide mechanistic models of the origins of disease.
Figure 2.
Figure 2.
Mechanisms for cellular protein quality control and degradation, and effects of sequence variation on the folding energy landscape. (a) In a folded protein (left), the degradation signals (degrons, orange) are generally buried inside the protein. Upon local and partial unfolding (bottom route) or full unfolding (top route) one or more degrons may become exposed. The cellular protein quality control (PQC) components (magnifying glass), such as molecular chaperones and E3 ubiquitin-protein ligases, scan the cell for such degradation signals and target the substrates for degradation (right). Variants may affect all of these steps including increasing the populations of unfolded or partially unfolded states, or creating or removing degron sequences. (b) A globally destabilising variant brings the free energy of the folded conformation closer to that of the fully unfolded state, increasing the population of this state and making the protein more easily targeted for degradation. (c) Because local unfolding involves smaller free energy differences, amino acid changes with more modestly destabilizing effects may still cause a substantial increase in locally unfolded states, and possible exposure of degrons. In this way such variants can have a stronger effect in the cell than one would expect from the predicted thermodynamic change of global stability.
Figure 3.
Figure 3.
Deep mutational scanning for protein stability and variant abundance. Panels A–C outline the variant abundance by the massively parallel sequencing (VAMP-seq) method [45]: (a) generation of a large library of variants, typically all possible 19 variants at each site, and fusion to GFP; (b) abundance of the respective variant fusion construct determines each cell’s fluorescence; (c) fluorescence activated cell sorting (FACS), followed by sequencing and data analysis allows for the quantification of the abundance of each variant. (d) Distribution of VAMP-seq scores for missense variants in the protein PTEN, normalized such that unity corresponds to the wild type protein sequence and zero to the average of the 1% lowest scoring variants [45]. Green lines indicate the 5th and 95th percentile for synonymous variants; 56% of the missense variants fall within this range. (e) Accurate biophysical measurements of the change in protein stability upon amino acid changes have been collected over many years [46], but are dominated by substitutions to alanine, and a few other chemically, structurally, biophysically-motivated substitutions [82] (left). In contrast, a single VAMP-seq experiment provides data for a comparable number of variants, but is less bias chemically (right).
Figure 4.
Figure 4.
Three paradigms for predicting the consequences of amino acid changes. We illustrate the utility of (top) stability predictions, (middle) evolutionary analyses and (bottom) a regression model trained on deep mutational scanning data to predict the consequences for pathogenic and benign MSH2 variants from the ClinVar database [68]. (a) The allele frequencies in the gnomAD database of genome sequences (gnomad.broadinstitute.org) are plotted against the predicted score of the variant. The variant scores are ordered so that detrimental variants are shown at the top, and stability prediction scores were truncated at 15 kcal mol−1. Red and blue points are those reported as (likely) pathogenic and benign, respectively, in ClinVar. The left-most ‘column’ of points (labelled ‘not reported in gnomAD’) contains variants reported in ClinVar, but not observed in gnomAD; they mostly correspond to known pathogenic variants expected to be found at very low allele frequencies. (b) Raincloud plots [83] illustrating the predicted score distributions of pathogenic (red), population (grey) and benign (blue) variants. For all three prediction methods there is a clear, yet also non-perfect, separation between pathogenic and benign variants. (c) Cumulative distribution functions showing which fraction of variants are above/below any given score threshold. The red curve shows the fraction of pathogenic variants below the value (false negatives) and the blue curve the fraction of benign variants above the threshold (false positives). The horizontal dashed lines indicate the respective threshold for 25% false negative predictions, and the dotted lines are the thresholds for no false positives. Solid lines indicate the respective predictor’s value for the wild type. Overall the plots illustrate that all three predictors correctly identify many of the pathogenic variants as detrimental, and most of the benign variants as tolerated. The ‘area under the curve’ (AUC) in a receiver operating characteristic (ROC) analysis is 0.91, 0.90, and 0.91 for the three methods, respectively. To address the imbalance between the sizes in the pathogenic and benign datasets, the pathogenic dataset was split in three; these AUCs are averages over these three ROC analyses.
Figure 5.
Figure 5.
Rescuing protein stability as a strategy for therapy. The cellular levels of a destabilized protein variant may be increased by blocking the protein quality control (PQC) system (magnifying glass; middle) or the degradation machinery (trashcan; right). Alternatively, a small molecule (star) that associates with the native form of the protein may act to stabilize the protein.

References

    1. Shendure J and Akey JM (2015) The origins, determinants, and consequences of human mutations. Science 349, 1478–1483 - PubMed
    1. Manolio TA et al. (2017) Bedside Back to Bench: Building Bridges between Basic and Clinical Genomic Research. Cell 169, 6–12 - PMC - PubMed
    1. Lek M et al. (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 - PMC - PubMed
    1. Martin HC et al. (2018) Quantifying the contribution of recessive coding variation to developmental disorders. Science 42, 362: 1161–1164 - PMC - PubMed
    1. Roscoe BP et al. (2013) Analyses of the Effects of All Ubiquitin Point Mutants on Yeast Growth Rate. J Mol Biol 425, 1363–1377 - PMC - PubMed

Publication types

LinkOut - more resources