Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 21;13(12):957.
doi: 10.15252/msb.20177908.

A framework for exhaustively mapping functional missense variants

Affiliations

A framework for exhaustively mapping functional missense variants

Jochen Weile et al. Mol Syst Biol. .

Abstract

Although we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here, we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin-like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features and confidently identify pathogenic variation. Assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes, suggesting that DMS could ultimately map functional variation for all human disease genes.

Keywords: complementation; deep mutational scanning; genotype–phenotype; variants of uncertain significance.

PubMed Disclaimer

Figures

Figure 1
Figure 1. UBE2I screening and validation
  1. Modular structure of the screening framework.

  2. Raw DMS‐BarSeq fitness scores in technical replicates (separately plated assays of the same pool) and biological replicates (separate sub‐strains in the pool carrying the same variants).

  3. Manual spotting assay validation of a representative set of variants. Each row represents a consecutive fivefold dilution. Marked in red: maximal dilution visible in empty vector control. Marked in green: maximal dilution with visible human wt control. Marked in yellow: dilution steps exceeding visible human wt control. Bar heights represent summary screen scores. Error bars show Bayesian regularized standard error based on three technical replicates and a prior based on pre‐selection counts and final score (see Materials and Methods for details).

  4. Variants grouped by evolutionary conservation (AMAS score) of their respective sites (top) and grouped by structural context within the protein core, within protein–protein interaction interfaces or on remaining protein surface (bottom). Boxes range across the second and third quartiles with the middle bar representing the median. Whiskers show the most extreme values within 1.5×IQR. As normality cannot be assumed for the distributions of fitness scores, one‐sided two‐sample Wilcoxon–Mann–Whitney tests were used. Low conservation (n = 60 clones) vs. medium conservation (n = 105 clones) W = 3789, *P = 0.015; medium conservation (n = 105 clones) vs. high conservation (n = 404 clones) W = 28043, *P = 1.8 × 10−7; Core (n = 208 clones) vs. surface (n = 42 clones) W = 1649, *P = 1.01 × 10−10; interface (n = 215 clones) vs. surface (n = 42 clones) W = 2461, *P = 1.58 × 10−6.

Figure 2
Figure 2. Validation of machine‐learning imputation for UBE2I
  1. Cross‐validation evaluation: Joint scores from DMS‐BarSeq and DMS‐TileSeq compared to machine‐learning prediction in 10× cross‐validation. The agreement is comparable to that between biological replicates in the screen itself (compare to Fig 1B).

  2. Error map, showing cross‐validation results for each data point sorted by amino acid position and mutant residue.

  3. Comparison of imputation predictions with individual spotting assays. Each row represents a consecutive fivefold dilution. Marked in red: maximal dilution visible in empty vector control. Marked in green: maximal dilution with visible human wt control. Marked in yellow: dilution steps exceeding visible human wt control.

  4. Most informative features in the Random Forest imputation, as measured in % increase in mean squared deviation upon randomization of a given feature.

Figure 3
Figure 3. A complete functional map of UBE2I
  1. A complete functional map of UBE2I as resulting from the combination of the complementation screen and machine‐learning imputation and refinement. An impact score of 0 (blue) corresponds to a fitness equivalent to the empty vector control. A score of 1 (white) corresponds to a fitness equivalent to the wild‐type control. A score > 1 (red) corresponds to fitness above wild‐type levels. Shown above, for comparison are sequence conservation, secondary structure, solvent accessibility, and burial of the respective amino acid in protein–protein interaction interfaces with covalently and non‐covalently bound SUMO, the E1 UBA2, the sumoylation target RanGAP1, the E3 RanBP2 and UBE2I itself. Hydrogen bonds or salt bridges between residues and the respective interaction partner are marked with red asterisks. Residues buried in both the covalent SUMO and client interfaces are framed with dotted lines, marking the core members of the active site.

  2. UBE2I crystal structure with residues colored according to the median mutant fitness. Colors as in (A). The interacting substrate's ΨKxE motif is shown in green stick model; Covalently bound SUMO is shown as a red cartoon model; and non‐covalently bound SUMO is shown in brown cartoon model. The structures shown were obtained by alignment of PDB entries 3UIP and 2PE6.

  3. UBE2I crystal structure as in (B), with residues colored according to maximum mutant fitness.

Figure 4
Figure 4. Functional maps of SUMO1, TPK1, and calmodulin (CALM1/2/3)
Layout and colors as in Fig 3.
Figure 5
Figure 5. DMS functional maps reflect clinical phenotypes
  1. Comparison of (refined) functional scores between rare polymorphisms (GnomAD) and somatic tumor mutations (COSMIC) in UBE2I and SUMO1. Bars show median and quartiles. As normality cannot be assumed for the distributions of fitness scores, a one‐sided two‐sample Wilcoxon–Mann–Whitney test was used: n = {26,31} variants, W = 570.5, P = 3.73 × 10−3.

  2. Impact score distributions in calmodulin overlayed with previously observed alleles in CALM1, CALM2, and CALM3: Rare alleles from GnomAD are shown in green; ClinVar alleles classified as pathogenic are shown in red.

  3. Precision‐recall curves for our DMS atlas, PROVEAN, and PolyPhen‐2 with respect to distinguishing Gnomad variants from pathogenic alleles from ClinVar.

References

    1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249 - PMC - PubMed
    1. Alontaga AY, Ambaye ND, Li Y‐J, Vega R, Chen C‐H, Bzymek KP, Williams JC, Hu W, Chen Y (2015) RWD Domain as an E2 (Ubc9)‐Interaction Module. J Biol Chem 290: 16550–16559 - PMC - PubMed
    1. Andreou AM, Pauws E, Jones MC, Singh MK, Bussen M, Doudney K, Moore GE, Kispert A, Brosens JJ, Stanier P (2007) TBX22 missense mutations found in patients with X‐linked cleft palate affect DNA binding, sumoylation, and transcriptional repression. Am J Hum Genet 81: 700–712 - PMC - PubMed
    1. Baba D, Maita N, Jee J‐G, Uchimura Y, Saitoh H, Sugasawa K, Hanaoka F, Tochio H, Hiroaki H, Shirakawa M (2005) Crystal structure of thymine DNA glycosylase conjugated to SUMO‐1. Nature 435: 979–982 - PubMed
    1. Baker LJ, Dorocke JA, Harris RA, Timm DE (2001) The crystal structure of yeast thiamin pyrophosphokinase. Structure 9: 539–546 - PubMed

Publication types