Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 20:16:168.
doi: 10.1186/s12859-015-0590-4.

Software for the analysis and visualization of deep mutational scanning data

Affiliations

Software for the analysis and visualization of deep mutational scanning data

Jesse D Bloom. BMC Bioinformatics. .

Abstract

Background: Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection.

Results: I describe a software package, dms_tools, to infer the impacts of mutations from deep mutational scanning data using a likelihood-based treatment of the mutation counts. I show that dms_tools yields more accurate inferences on simulated data than simply calculating ratios of counts pre- and post-selection. Using dms_tools, one can infer the preference of each site for each amino acid given a single selection pressure, or assess the extent to which these preferences change under different selection pressures. The preferences and their changes can be intuitively visualized with sequence-logo-style plots created using an extension to weblogo.

Conclusions: dms_tools implements a statistically principled approach for the analysis and subsequent visualization of deep mutational scanning data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A deep mutational scanning experiment.(A) A gene is mutagenized to create a library that contains all single codon mutations. The mutant library is introduced into cells or viruses and subjected to a functional selection that enriches beneficial mutations and depletes deleterious ones. Deep sequencing is used to count mutations in a sample of the variants present pre- and post-selection. Using dms_tools, the data can be analyzed to infer the “preference” of each site for each amino acid; in the visualization, letter heights are proportional to the preference for that amino acid. (B) The experiment can be extended by subjecting the library of functional variants to two different selection pressures, and using deep sequencing to assess which variants are favored in one condition versus the other. Using dms_tools, the data can be analyzed to infer the “differential preference” of each site for each amino acid in the alternative selection s2 versus the control selection s1; in the visualization, letter heights above or below the line are proportional to the differential preference for or against that amino acid.
Figure 2
Figure 2
Site-specific preferences from deep mutational scanning of a Tn5 transposon. Melnikov et al. [10] performed deep mutational scanning on a Tn5 transposon using kanamycin selection, and reported the counts of amino-acid mutations for two biological replicates of the experiment. Here I have used dms_tools to infer the preferences. (A) Visualization of the preferences averaged across the two replicates. (B) Correlation between the preferences inferred from each of the two replicates. Given files containing the mutation counts, the plots can be generated as logoplot.pdf and corr.pdf with the following commands:formula image.
Figure 3
Figure 3
Site-specific preferences from deep mutational scanning of influenza hemagglutinin. Thyagarajan and Bloom [11] performed deep mutational scanning on influenza hemagglutinin, and reported the counts of codon mutations for three biological replicates of the experiment. Here I have used dms_tools to infer the preferences. (A) Visualization of the preferences averaged across the three replicates. (B) Correlations between the preferences from each pair of replicates. Given files containing the mutation counts, the plots can be generated as logoplot.pdf, corr_1_2.pdf, corr_1_3.pdf, and corr_2_3.pdf with the following commands:formula image.
Figure 4
Figure 4
Accuracy of preference inference on simulated data. Deep mutational scanning counts were simulated using the preferences in Figure 2A and realistic mutation and error rates that were uneven across sites and characters as in actual experiments. The simulations were done (A) without or (B) with sequencing errors quantified by control libraries. Plots show the correlation between the actual and inferred preferences as a function of the product of the sequencing depth N and the average per-site mutation rate μ¯; real experiments typically have Nμ¯1000 to 2000 depending on the sequencing depth and gene length. Preferences are inferred using the full algorithm in dms_tools (top panels) or by simply calculating ratios of counts (bottom panels) using Equation 4 and its logical extension to include errors, both with a pseudocount of one. The dms_tools inferences are more accurate than the simple ratio estimation, with both methods converging to the actual values with increasing Nμ¯. Given files with the mutation counts, the plots in this figure can be generated as prefs_corr.pdf and ratio_corr.pdf with commands such as:formula image.
Figure 5
Figure 5
Inference of differential preferences on simulated data. To illustrate and test the inference of differential preferences, the experiment in Figure 1B was simulated at the codon level starting with the post-selection library that yielded the preferences in Figure 2. In the simulations, 20% of sites had different preferences between the control and alternative selection. (A), dms_tools was used to infer the differential preferences from the data simulated at N=107, and the resulting inferences were visualized. The overlay bars indicate which sites had non-zero differential preferences in the simulation. (B) The correlations between the inferred and actual differential preferences as a function of Nμ¯ show that the inferred values converge to the true ones. Given files with the mutation counts, the plots in this figure can be generated as logoplot.pdf and corr.pdf with the following commands:formula image.
Figure 6
Figure 6
Differential preferences following selection of influenza NS1 in the presence or absence of interferon. Wu et al. [13] generated libraries of influenza viruses carrying nucleotide mutations in the NS segment. They passaged these viruses in the presence or absence of interferon pre-treatment. Here, dms_tools was used to analyze and visualize the data to identify sites where different nucleotides are preferred in the presence versus the absence of interferon. Because the mutations were made at the nucleotide level, the data must also be analyzed at that level (unlike in Figures 2, 3, and 5, where codon mutagenesis means that the data can be analyzed at the amino-acid level). The plot can be generated as logoplot.pdf with the following commands:formula image.

References

    1. Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods. 2014;11(8):801–7. doi: 10.1038/nmeth.3027. - DOI - PMC - PubMed
    1. Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ, Baker D, et al. High-resolution mapping of protein sequence-function relationships. Nat Methods. 2010;7(9):741–6. doi: 10.1038/nmeth.1492. - DOI - PMC - PubMed
    1. Traxlmayr MW, Hasenhindl C, Hackl M, Stadlmayr G, Rybka JD, Borth N, et al. Construction of a stability landscape of the CH3 domain of human IgG1 by combining directed evolution with high throughput sequencing. J Mol Biol. 2012;423:397–412. doi: 10.1016/j.jmb.2012.07.017. - DOI - PMC - PubMed
    1. McLaughlin Jr RN, Poelwijk FJ, Raman A, Gosal WS, Ranganathan R. The spatial architecture of protein function and adaptation. Nature. 2012;491(7422):138. doi: 10.1038/nature11500. - DOI - PMC - PubMed
    1. Starita LM, Pruneda JN, Lo RS, Fowler DM, Kim HJ, Hiatt JB, et al. Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis. Proc Natl Acad Sci USA. 2013;110(14):1263–72. doi: 10.1073/pnas.1303309110. - DOI - PMC - PubMed

Publication types