Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 26;11(1):6023.
doi: 10.1038/s41467-020-19687-9.

UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution

Affiliations

UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution

Paul Jannis Zurek et al. Nat Commun. .

Abstract

The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein's amino acids ('intra-gene epistasis'). Our limited understanding of such epistasis hinders the correct prediction of the functional contributions and adaptive potential of mutations. Here we present a straightforward unique molecular identifier (UMI)-linked consensus sequencing workflow (UMIC-seq) that simplifies mapping of evolutionary trajectories based on full-length sequences. Attaching UMIs to gene variants allows accurate consensus generation for closely related genes with nanopore sequencing. We exemplify the utility of this approach by reconstructing the artificial phylogeny emerging in three rounds of directed evolution of an amine dehydrogenase biocatalyst via ultrahigh throughput droplet screening. Uniquely, we are able to identify lineages and their founding variant, as well as non-additive interactions between mutations within a full gene showing sign epistasis. Access to deep and accurate long reads will facilitate prediction of key beneficial mutations and adaptive potential based on in silico analysis of large sequence datasets.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Sequencing strategy for amplicon libraries with UMI-linked nanopore long reads (UMIC-seq).
An amplicon library with similar members, such as a plasmid pool of improved enzyme variants, an immune repertoire or metagenomic 16S sequences (1) is used as input to UMI-tagging and barcoding by PCR (2). The resulting products have unique sequence identifiers (UMI) for clustering and barcodes for multiplexing, as well as homology overhangs used for Gibson assembly. Gibson assembly provides plasmids for transformation, which enables the selection of a subset of the library, obtaining selectively amplified UMI-variant combinations (3). This sample is then subjected to nanopore sequencing (4) and data processing via barcode- and UMI-based clustering followed by variant calling, yielding the desired variant sequences with high accuracy (5).
Fig. 2
Fig. 2. Ultrahigh-throughput droplet microfluidic evolution of an amine dehydrogenase (AmDH) biocatalyst.
a Coupled reaction to measure AmDH activity in droplets. The enzyme AmDH catalyzes the deamination of (R)-1-methyl-3-phenylpropylamine to 4-phenyl-2-butanone, thereby reducing NAD+ to NADH and releasing NH3. NADH is regenerated upon reduction of WST-1, mediated by 1-methoxy-5-methylphenazinium methyl sulfate (mPMS). Reduced WST-1 exhibits absorbance at 455 nm, which is detectable by absorbance-activated droplet sorting (AADS) on chip. b A cycle of directed evolution using absorbance-activated droplet sorting (AADS). Generation of diversity by error-prone PCR was followed by transformation and compartmentalization of cells in droplets. Cell lysis upon droplet formation liberates AmDH and reaction progress was determined based on the formation of colored downstream product in a coupled reaction. The best ~0.5% of variants in the library were selected by AADS and the respective plasmid pool was isolated.
Fig. 3
Fig. 3. Phylogenetic analysis of directed evolution in sequence similarity networks.
a Full graph color coded by the round of evolution in which mutant sequences were first recorded. A graphical representation (with t-Distributed Stochastic Neighbour Embedding, tSNE) of variants identified in droplet screening illustrates possible trajectories in a fitness landscape: spots represent all unique variants identified within the pools of hits. Distinct clusters emerge over the course of evolution, as indicated by the color code showing the round of first emergence of each sequence. Size of wild type increased for visualization. b Analysis of founder variants. Part of the sequence similarity network is shown with the color indicating the corresponding core set of mutations that they share with the ‘founder variant’, so that these mutations are present in all variants within the cluster. Spot sizes correspond to total sequence count of the respective variant. The clusters are defined by multiple mutations and thus can only be inferred based on long sequence reads that inform phylogeny of possible trajectories during experimental directed evolution.
Fig. 4
Fig. 4. Analysis of epistasis in variant A64E R102S E323V.
a Hot-spot analysis. Positional enrichments mapped onto the structure of a homologous AmDH (PDB ID: 1C1D). If the frequency of mutation at a certain position increased over the rounds of directed evolution, its enrichment factor is color coded on the structure from green (low) to yellow to red (high). Enrichment was calculated from round 1 to round 2 for variants that persist into round 3. Positions at which mutations are enriched can be identified over the full range of the enzyme, with many enriched positions e.g. in the loop covering the substrate binding pocket. Positions exhibiting epistatic interaction (variant A64E R102S E323V) are highlighted. b Sign epistasis in variant A64E R102S E323V detected by long-read sequencing. Graph showing lysate deamination activities (initial rates) of constituent variants relative to the non-mutated parent. Sign epistasis can be seen in mutations of one founder variant (A64E R102S E323V). The mutation E323V individually decreases activity drastically (16% of the parental AmDH (WT) lysate deamination activity, Supplementary Fig. 5), while it has a beneficial impact when introduced into the variant A64E R102S (154% of A64E R102S lysate deamination activity). With conventional short-read sequencing for hot-spot analysis, E323V would have wrongly been determined as an activating mutation; yet E323V is only beneficial in the context of other mutations, which was correctly detected by long-read sequencing.

Similar articles

Cited by

References

    1. Arnold FH. Innovation by evolution: bringing new chemistry to life (Nobel Lecture) Angew. Chem. Int. Ed. 2019;58:14420–14426. doi: 10.1002/anie.201907729. - DOI - PubMed
    1. Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 2009;10:866–876. doi: 10.1038/nrm2805. - DOI - PMC - PubMed
    1. Maynard Smith J. Natural selection and the concept of a protein space. Nature. 1970;225:563–564. doi: 10.1038/225563a0. - DOI - PubMed
    1. Colin P-Y, Zinchenko A, Hollfelder F. Enzyme engineering in biomimetic compartments. Curr. Opin. Struct. Biol. 2015;33:42–51. doi: 10.1016/j.sbi.2015.06.001. - DOI - PubMed
    1. Boucher JI, et al. Viewing protein fitness landscapes through a next-gen lens. Genetics. 2014;198:461–471. doi: 10.1534/genetics.114.168351. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances