. 2017 Dec 21;13(12):957.

doi: 10.15252/msb.20177908.

A framework for exhaustively mapping functional missense variants

Jochen Weile^{1

2

3

4}, Song Sun^{1

2

3

4

5}, Atina G Cote^{1

2

3}, Jennifer Knapp^{1

2

3}, Marta Verby^{1

2

3}, Joseph C Mellor^{2

6}, Yingzhou Wu^{1

2

3

4}, Carles Pons⁷, Cassandra Wong^{1

2}, Natascha van Lieshout¹, Fan Yang^{1

2

3

4}, Murat Tasan^{1

2

3

4}, Guihong Tan^{2

3}, Shan Yang⁸, Douglas M Fowler⁹, Robert Nussbaum⁸, Jesse D Bloom¹⁰, Marc Vidal^{11

12}, David E Hill¹¹, Patrick Aloy^{7

13}, Frederick P Roth^{14

2

3

4

15}

Affiliations

¹ Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.
² The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
³ Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
⁴ Department of Computer Science, University of Toronto, Toronto, ON, Canada.
⁵ Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
⁶ SeqWell Inc, Boston, MA, USA.
⁷ Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain.
⁸ Invitae Corp., San Francisco, CA, USA.
⁹ Department of Genome Sciences, University of Washington, Seattle, WA, USA.
¹⁰ Fred Hutchinson Research Center, Seattle, WA, USA.
¹¹ Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA.
¹² Department of Genetics, Harvard Medical School, Boston, MA, USA.
¹³ Institució Catalana de Recerca I Estudis Avançats (ICREA), Barcelona, Catalonia, Spain.
¹⁴ Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada fritz.roth@utoronto.ca.
¹⁵ Canadian Institute for Advanced Research, Toronto, ON, Canada.

PMID: 29269382
PMCID: PMC5740498
DOI: 10.15252/msb.20177908

A framework for exhaustively mapping functional missense variants

Jochen Weile et al. Mol Syst Biol. 2017.

. 2017 Dec 21;13(12):957.

doi: 10.15252/msb.20177908.

Authors

Affiliations

¹ Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.
² The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
³ Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
⁴ Department of Computer Science, University of Toronto, Toronto, ON, Canada.
⁵ Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
⁶ SeqWell Inc, Boston, MA, USA.
⁷ Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain.
⁸ Invitae Corp., San Francisco, CA, USA.
⁹ Department of Genome Sciences, University of Washington, Seattle, WA, USA.
¹⁰ Fred Hutchinson Research Center, Seattle, WA, USA.
¹¹ Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA.
¹² Department of Genetics, Harvard Medical School, Boston, MA, USA.
¹³ Institució Catalana de Recerca I Estudis Avançats (ICREA), Barcelona, Catalonia, Spain.
¹⁴ Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada fritz.roth@utoronto.ca.
¹⁵ Canadian Institute for Advanced Research, Toronto, ON, Canada.

PMID: 29269382
PMCID: PMC5740498
DOI: 10.15252/msb.20177908

Abstract

Although we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here, we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin-like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features and confidently identify pathogenic variation. Assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes, suggesting that DMS could ultimately map functional variation for all human disease genes.

Keywords: complementation; deep mutational scanning; genotype–phenotype; variants of uncertain significance.

PubMed Disclaimer

Figures

**Figure 1. UBE2I screening and validation**
Modular structure of the screening framework.
Raw DMS‐BarSeq fitness scores in technical replicates (separately plated assays of the same pool) and biological replicates (separate sub‐strains in the pool carrying the same variants).
Manual spotting assay validation of a representative set of variants. Each row represents a consecutive fivefold dilution. Marked in red: maximal dilution visible in empty vector control. Marked in green: maximal dilution with visible human wt control. Marked in yellow: dilution steps exceeding visible human wt control. Bar heights represent summary screen scores. Error bars show Bayesian regularized standard error based on three technical replicates and a prior based on pre‐selection counts and final score (see Materials and Methods for details).
Variants grouped by evolutionary conservation (AMAS score) of their respective sites (top) and grouped by structural context within the protein core, within protein–protein interaction interfaces or on remaining protein surface (bottom). Boxes range across the second and third quartiles with the middle bar representing the median. Whiskers show the most extreme values within 1.5×IQR. As normality cannot be assumed for the distributions of fitness scores, one‐sided two‐sample Wilcoxon–Mann–Whitney tests were used. Low conservation (n = 60 clones) vs. medium conservation (n = 105 clones) W = 3789, *P = 0.015; medium conservation (n = 105 clones) vs. high conservation (n = 404 clones) W = 28043, *P = 1.8 × 10⁻⁷; Core (n = 208 clones) vs. surface (n = 42 clones) W = 1649, *P = 1.01 × 10⁻¹⁰; interface (n = 215 clones) vs. surface (n = 42 clones) W = 2461, *P = 1.58 × 10⁻⁶.

**Figure 2. Validation of machine‐learning imputation for UBE2I**
Cross‐validation evaluation: Joint scores from DMS‐BarSeq and DMS‐TileSeq compared to machine‐learning prediction in 10× cross‐validation. The agreement is comparable to that between biological replicates in the screen itself (compare to Fig 1B).
Error map, showing cross‐validation results for each data point sorted by amino acid position and mutant residue.
Comparison of imputation predictions with individual spotting assays. Each row represents a consecutive fivefold dilution. Marked in red: maximal dilution visible in empty vector control. Marked in green: maximal dilution with visible human wt control. Marked in yellow: dilution steps exceeding visible human wt control.
Most informative features in the Random Forest imputation, as measured in % increase in mean squared deviation upon randomization of a given feature.

**Figure 3. A complete functional map of UBE2I**
A complete functional map of UBE2I as resulting from the combination of the complementation screen and machine‐learning imputation and refinement. An impact score of 0 (blue) corresponds to a fitness equivalent to the empty vector control. A score of 1 (white) corresponds to a fitness equivalent to the wild‐type control. A score > 1 (red) corresponds to fitness above wild‐type levels. Shown above, for comparison are sequence conservation, secondary structure, solvent accessibility, and burial of the respective amino acid in protein–protein interaction interfaces with covalently and non‐covalently bound SUMO, the E1 UBA2, the sumoylation target RanGAP1, the E3 RanBP2 and UBE2I itself. Hydrogen bonds or salt bridges between residues and the respective interaction partner are marked with red asterisks. Residues buried in both the covalent SUMO and client interfaces are framed with dotted lines, marking the core members of the active site.
UBE2I crystal structure with residues colored according to the median mutant fitness. Colors as in (A). The interacting substrate's ΨKxE motif is shown in green stick model; Covalently bound SUMO is shown as a red cartoon model; and non‐covalently bound SUMO is shown in brown cartoon model. The structures shown were obtained by alignment of PDB entries 3UIP and 2PE6.
UBE2I crystal structure as in (B), with residues colored according to maximum mutant fitness.

**Figure 4. Functional maps of SUMO1, TPK1, and calmodulin (CALM1/2/3)**
Layout and colors as in Fig 3.

**Figure 5. DMS functional maps reflect clinical phenotypes**
Comparison of (refined) functional scores between rare polymorphisms (GnomAD) and somatic tumor mutations (COSMIC) in UBE2I and SUMO1. Bars show median and quartiles. As normality cannot be assumed for the distributions of fitness scores, a one‐sided two‐sample Wilcoxon–Mann–Whitney test was used: n = {26,31} variants, W = 570.5, P = 3.73 × 10⁻³.
Impact score distributions in calmodulin overlayed with previously observed alleles in CALM1, CALM2, and CALM3: Rare alleles from GnomAD are shown in green; ClinVar alleles classified as pathogenic are shown in red.
Precision‐recall curves for our DMS atlas, PROVEAN, and PolyPhen‐2 with respect to distinguishing Gnomad variants from pathogenic alleles from ClinVar.

See this image and copyright information in PMC

References

1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249 - PMC - PubMed
1. Alontaga AY, Ambaye ND, Li Y‐J, Vega R, Chen C‐H, Bzymek KP, Williams JC, Hu W, Chen Y (2015) RWD Domain as an E2 (Ubc9)‐Interaction Module. J Biol Chem 290: 16550–16559 - PMC - PubMed
1. Andreou AM, Pauws E, Jones MC, Singh MK, Bussen M, Doudney K, Moore GE, Kispert A, Brosens JJ, Stanier P (2007) TBX22 missense mutations found in patients with X‐linked cleft palate affect DNA binding, sumoylation, and transcriptional repression. Am J Hum Genet 81: 700–712 - PMC - PubMed
1. Baba D, Maita N, Jee J‐G, Uchimura Y, Saitoh H, Sugasawa K, Hanaoka F, Tochio H, Hiroaki H, Shirakawa M (2005) Crystal structure of thymine DNA glycosylase conjugated to SUMO‐1. Nature 435: 979–982 - PubMed
1. Baker LJ, Dorocke JA, Harris RA, Timm DE (2001) The crystal structure of yeast thiamin pyrophosphokinase. Structure 9: 539–546 - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A framework for exhaustively mapping functional missense variants

Affiliations

A framework for exhaustively mapping functional missense variants

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous