High-throughput identification of protein mutant stability computed from a double mutant fitness landscape

Nicholas C Wu^{1

2

3}, C Anders Olson¹, Ren Sun¹

Affiliations

¹ Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, California, 90095.
² Molecular Biology Institute, University of California, Los Angeles, California, 90095.
³ Department of Integrative Structural and Computational Biology, the Scripps Research Institute, La Jolla, California, 92037.

PMID: 26540565
PMCID: PMC4815338
DOI: 10.1002/pro.2840

High-throughput identification of protein mutant stability computed from a double mutant fitness landscape

Nicholas C Wu et al. Protein Sci. 2016 Feb.

. 2016 Feb;25(2):530-9.

doi: 10.1002/pro.2840. Epub 2015 Dec 8.

Authors

Nicholas C Wu^{1

2

3}, C Anders Olson¹, Ren Sun¹

Affiliations

¹ Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, California, 90095.
² Molecular Biology Institute, University of California, Los Angeles, California, 90095.
³ Department of Integrative Structural and Computational Biology, the Scripps Research Institute, La Jolla, California, 92037.

PMID: 26540565
PMCID: PMC4815338
DOI: 10.1002/pro.2840

Abstract

The effect of a mutation on protein stability is traditionally measured by genetic construction, expression, purification, and physical analysis using low-throughput methods. This process is tedious and limits the number of mutants able to be examined in a single study. In contrast, functional fitness effects can be measured in a high-throughput manner by various deep mutational scanning tools. Using protein GB 1, we have recently demonstrated the feasibility of estimating the mutational stability effect ( ΔΔG) of single-substitution based on the functional fitness profile of all double-substitutions. The principle is to identify genetic backgrounds that have an exhausted stability margin. The functional effect of an additional substitution on these genetic backgrounds can then be used to compute the mutational ΔΔG based on the biophysical relationship between functional fitness and thermodynamic stability. However, to identify such genetic backgrounds, the approach described in our previous study required a benchmark dataset, which is a set of known mutational ΔΔG. In this study, a benchmark-independent approach is developed. The genetic backgrounds of interest are identified using k-means clustering with the integration of structural information. We further demonstrated that a reasonable approximation of ΔΔG can also be obtained without taking structural information into account. In summary, this study describes a novel method for computing ΔΔG from double-substitution functional fitness profiles alone, without relying on any known mutational ΔΔG as a benchmark.

Keywords: fitness profiling; mutagenesis; mutant stability prediction; protein stability.

PubMed Disclaimer

Figures

**Figure 1**
Conceptual basis for studying protein stability from functional measurement. (A, B) A schematic representation of the nonlinear relationship between mutant stability and protein folding under genetic backgrounds with different stabilities is shown for (A) a destabilizing genetic background, in which the protein is partially unfolded, and (B) a stable genetic background, in which the protein is fully folded (native state). Blue represents the genetic background, red represents a destabilizing substitution on the genetic background, green represents a stabilizing substitution on the genetic background. (C) A double‐substitution functional profile can be partitioned into individual single‐substitution functional profile for different genetic backgrounds. The double‐substitution functional profile is shown as a symmetric matrix. The fitness value of each mutant was indicated by $W_{i, j}$ , where i and j indicates the substitution. When i equals j, it represents a single substitution. (D) A diagram shows the logical flow of computing $Δ Δ$ G from a double‐substitution functional profile. $Δ Δ$ G for individual single substitution can be computed from the functional profile of a given genetic background. Nonetheless, several assumptions are involved in the computing of $Δ Δ$ G from functional profile. As a result, only those genetic backgrounds that satisfy the assumptions would allow accurate calculation of $Δ Δ$ G from the functional profile.

**Figure 2**
Property of S _BG with a higher R _Literature. (A, B) S _BG with a R _Literature of >0.85 are colored in red and S _BG with a R _Literature of >0.75 are colored in blue. (A) A two dimension scatter plot is shown with each S _BG represented by a data point. The y‐axis represents the RSA and the x‐axis represents the fitness (W). The only nonburied S _BG with high correlation is K4D which is labeled. (B) The spatial locations for those S _BG with a R _Literature of >0.75 are shown on the protein G structure (PDB: 1PGA).12 (C) The R _Literature, RSA, and W are shown for those S _BG with a R _Literature of >0.85.

**Figure 3**
Hierarchical clustering of genetic backgrounds based on the similarity of $Δ Δ$ G profile. (A) Hierarchical clustering of individual S _BG based on their pairwise correlation of $Δ Δ$ G profile. The pairwise correlation between $Δ Δ$ G profiles is color coded as indicated. (B–D) Distribution of R _Literature for individual S _BG within (B) group I, (C) group II, and (D) group III was shown.

**Figure 4**
Results from k‐means clustering. k‐means clustering was performed to group S _BG by the similarity of $Δ Δ$ G profile. For a given k selection, 100 independently runs of k‐means clustering were performed. Consequently, 100 × k groups of S _BG would be obtained. This analysis was performed for S _BG with a fitness within the indicated range. There were 678 S _BG within a fitness range of 0–1 (orange), 249 S _BG within a fitness range of 0.4–1 (cyan), 153 S _BG within a fitness range of 0.4–0.8 (brown), 582 S _BG within a fitness range of 0–0.8 (blue). (A) The R _Literature was computed for the S _BG group with the lowest mean RSA. (B) The relationship between mean RSA and R _Literature for the S _BG groups produced from 100 runs of k‐means clustering with k = 18 and an S _BG fitness range between 0.4 and 0.8. (C) The R _Literature was computed for the S _BG group with the lowest mean hydrophobic score of the WT amino acids of S _BG. The gray‐dotted line represents the R _Literature from $Δ Δ$ G prediction using Rosetta software.13 Parameters were taken from row 16 of Table I in Kellogg *et al*.14

See this image and copyright information in PMC

References

1. Magliery TJ, Lavinder JJ, Sullivan BJ (2011) Protein stability by number: high‐throughput and statistical approaches to one of protein science's most difficult problems. Curr Opin Chem Biol 15:443–451. - PMC - PubMed
1. Giver L, Gershenson A, Freskgard PO, Arnold FH (1998) Directed evolution of a thermostable esterase. Proc Natl Acad Sci U S A 95:12809–12813. - PMC - PubMed
1. Foit L, Morgan GJ, Kern MJ, Steimer LR, von Hacht AA, Titchmarsh J, Warriner SL, Radford SE, Bardwell JC. (2009) Optimizing protein stability in vivo. Mol Cell 36:861–871. - PMC - PubMed
1. Fowler DM, Fields S (2014) Deep mutational scanning: a new style of protein science. Nat Methods 11:801–807. - PMC - PubMed
1. Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C, Arnold FH. (2005) Thermodynamic prediction of protein neutrality. Proc Natl Acad Sci USA 102:606–611. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Associated data

Actions
- Search in PubMed
- Search in Structure

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

High-throughput identification of protein mutant stability computed from a double mutant fitness landscape

Affiliations

High-throughput identification of protein mutant stability computed from a double mutant fitness landscape

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Associated data

LinkOut - more resources

Full Text Sources

Other Literature Sources