Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 28;164(3):476-86.
doi: 10.1016/j.cell.2015.12.024. Epub 2016 Jan 21.

De Novo Evolutionary Emergence of a Symmetrical Protein Is Shaped by Folding Constraints

Affiliations

De Novo Evolutionary Emergence of a Symmetrical Protein Is Shaped by Folding Constraints

Robert G Smock et al. Cell. .

Abstract

Molecular evolution has focused on the divergence of molecular functions, yet we know little about how structurally distinct protein folds emerge de novo. We characterized the evolutionary trajectories and selection forces underlying emergence of β-propeller proteins, a globular and symmetric fold group with diverse functions. The identification of short propeller-like motifs (<50 amino acids) in natural genomes indicated that they expanded via tandem duplications to form extant propellers. We phylogenetically reconstructed 47-residue ancestral motifs that form five-bladed lectin propellers via oligomeric assembly. We demonstrate a functional trajectory of tandem duplications of these motifs leading to monomeric lectins. Foldability, i.e., higher efficiency of folding, was the main parameter leading to improved functionality along the entire evolutionary trajectory. However, folding constraints changed along the trajectory: initially, conflicts between monomer folding and oligomer assembly dominated, whereas subsequently, upon tandem duplication, tradeoffs between monomer stability and foldability took precedence.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
The β-Propeller Fold and Ancestral Motif Reconstruction (A) T. tridentatus tachylectin-2 is an extant lectin with the exemplary features of the propeller fold (PDB: 1TL2). It is composed of five sequence motifs, colored by order from blue (N terminus) to red (C terminus), comprising five structural motifs, or blades (in dotted outline). The five saccharide (GlcNAc in black sticks) binding sites are at the interfaces between the blades. Each motif and blade comprises four β strands (represented as arrows; numbered I–IV). However, one β strand is permuted so that the C-terminal strand IV completes the N-terminal blade to give the Velcro closure. (B) The genetic mechanism of topological permutation. Motif expansion followed by new start and stop codons allows interconversion between the Velcro closure and the intact blade topologies. The boundaries of blades (structural motifs) are shown in black and white. (C) Putative topologies of single motif proteins in the Velcro topology of extant tachylectin-2 and with permutation of strand IV to the topology of an intact structural blade. (D) Crystal structure of the homologous lectin, N. vectensis tachylectin-2 (determined at 1.9 Å; blue) superimposed with T. tridentatus tachylectin-2 (gray, PDB: 1TL2 with GlcNAc in spheres). The nearly identical structures share 44% sequence identity. (E) Reconstruction of the ancestral motif that gave rise to tachylectin-2 was performed by phylogenetic modeling of the 10 T. tridentatus and N. vectensis motifs. FastML gave a probabilistic model of amino acid states for the single ancestral motif (top). The intact blade permutation frame is also shown (middle). The inference did not significantly differ when based on the motifs of T. tridentatus alone (bottom). See also Figure S1.
Figure 2
Figure 2
De Novo Emergence and Maturation of Ancestral Motifs A basic evolutionary trajectory leading to tachylectin-2 is illustrated. The emergence of WT tachyelctin-2 from single ancestral motifs is likely promoted by genetic mechanisms that include gene duplication, fusion, diversification, and topological permutation. Key experimental constructs are annotated in this scheme. For simplicity, alternative intermediates such as fusions of two to four motifs and their diversification and fusion to non-propeller domains are not depicted. WT stands for sequence of the extant, WT tachylectin-2. Anc stands for reconstructed ancestral variants, as single motifs (Anc1) or duplicated tandem fusions (Anc5); V subscripts indicate the extant Velcro topology, whereas B subscripts represent the alternative intact blade topology.
Figure 3
Figure 3
Ancestral Single Motifs Are Functional as Oligomers and in Tandem Fusions The total binding capacity of single motif lectins was measured by the mucin-binding signal of cell lysates in ELISA. The null sign (Ø) indicates no detectable binding. (A) The early steps of the presumed trajectory depicted in Figure 2. Functional single motifs were isolated from ancestral substitution libraries in the permutation frame of an intact blade (Anc1B library). WT-derived motifs (WT41V, WT41B), a single MPA motif (Anc1B MPA), and ancestral libraries in the Velcro frame (Anc1V library) were non-functional. Identical tandem fusion of the functional single motifs (e.g., AncA1B to AncA5B and AncB1B to AncB5B) enhanced total binding capacity with no other change in sequence. (B) The crystal structure of AncB1B bound to GlcNAc (1.7 Å) revealed a pentamer (five colored subunits) in a permuted topology that is nearly identical to the WT tachylectin-2 monomer (gray, PDB: 1TL2). (C) Crystal structure of the single, oligomerizing motif AncB1B. Mutations that converged in the selection of single motifs localized along subunit interfaces and near the GlcNAc binding site (spheres: N12D and K42L from the first ancestral library; T14M and S31N from the second ancestral library; F23L from error-prone PCR). (D) In later steps of the possible trajectory, topological permutations of tandem fusions led to the extant Velcro frame with relatively little consequence in total binding capacity (e.g., Anc5V derived from Anc5B). The putative intermediates enabling these permutations (Anc6B) were also viable. All sequences except WTV and WTB comprise internally identical motif repeats. (E) The total binding capacity of an identical tandem fusion of the WT fourth motif (WT45V) was improved by selective diversification following random mutagenesis. Internal sequence identity is shown in parentheses. See also Figures S2 and S3 and Table S1.
Figure 4
Figure 4
Folding Efficiency Was the Primary Trait under Optimization (A) In addition to total binding capacity (Figure 3), five different parameters were measured for representative variants along the evolutionary trajectory: the levels of soluble protein in crude cell lysates, the level of insoluble aggregates, binding affinity to GlcNAc, thermal stability of the native state, in vitro folding efficiency. Correlations between the six datasets, as indicated by groups 1 and 2, were detected by principal components analysis applied without prior bias. (B) As indicated in (A), folding efficiency in vitro and soluble expression in vivo were highly correlated with each other and with total binding capacity in cell lysates. The R-squared linear regression statistic is shown on each x axis. (C) Native stability, binding affinity, and the levels of insoluble aggregates were poorly correlated with total binding capacity overall, but did show a consistent trend by which identical fusions (AncA5B and Anc5V) showed the highest values for each of these properties. (D) Native stability, binding affinity and the levels of insoluble aggregates were highly correlated to each other. See also Table S2.
Figure 5
Figure 5
The Changing Roles of Intermolecular Assembly and Stability along the Evolutionary Trajectory (A) Intermolecular interactions were reshaped through the trajectory. Folding efficiency was measured at various concentrations and normalized to the maximum value for each construct (the absolute folding yields are shown in Figure 4B). Single motifs (blue) showed a bell-shaped concentration-dependence, thus indicating a tradeoff between native pentamerization and non-native intermolecular interaction leading to misfolding. In contrast, the monomeric tandem motif fusions folded most efficiently at low concentration where intermolecular interaction is minimized (green; WT in black). (B) When arranged by their evolutionary progression, a changing trend in native-state stability was observed: single-motif pentamers were moderately stable, monomeric identical fusions were hyperstable, and the selective diversification of identical fusions (WT45VA and WT45VB) returned to the moderate stability of WTV. Thermal unfolding was measured by CD. (C) Unfolding equilibria measurements at dilute protein concentration (0.5 μM) revealed a stable folding intermediate upon identical motif fusion. Pentameric single motifs were fit to a two-state folding model (blue), and the other constructs were fit to a three-state model (identical tandem fusions, green; WT, black). The stability associated with a folding intermediate between inflections (ΔCm) was 0.8 M GdmCl for AncA5p and 1.2 M GdmCl for Anc5 versus 0.3 M GdmCl for WT. (D) The folding intermediates of identical tandem fusions were more populated than that of WT. The relative fractions of intermediates were extracted from the model fitting in (C), with SDs shown by shaded regions. (E) Identical tandem fusions formed misfolded, denaturation-resistant multimers more readily than WT. Natively folded propellers were incubated at high concentration (100 μM) and analyzed by SDS-PAGE. See also Figures S3 and S4 and Table S3.
Figure 6
Figure 6
Single-Motif Proteins Are Found in Genomes and Follow a Mechanism of Duplication, Fusion, and Divergence (A) Single motifs (42 amino acids, one green bar) were identified in C. watsonii open reading frames with 37%–65% sequence identity to six-motif β-propellers proteins (average identity for all propeller motifs) within the same genome. Proteins are labeled with their Uniprot accession codes. (B) Molecular clock divergence posits that “young” propellers that are less diverged from ancestral states should closely resemble an originating single motif and preserve high internal sequence identity. In a Frankia sp. strain, this correlation was observed among a large collection of five- to seven-motif propellers and a single motif protein without non-propeller domains (Pearson’s r = 0.91, p < 0.001). See also Figures S5 and S6, Table S6, and Data S1.
Figure S1
Figure S1
Ancestral Reconstruction of the Lectin Propeller Motif, Related to Figure 1 (A) A phylogenetic tree of tachylectin-2 sequence motifs was constructed from an alignment of ten individual motifs of the T. tridentatus and N. vectensis tachylectin-2 sequences. Excluding N. vectensis motif 5, the motifs are monophyletic with respect to the source protein. The tree topology could not be improved by inclusion of an X. laevis outgroup sequence. (B) A probabilistic ancestral motif was reconstructed from the root node of the motif tree in (B). The posterior probabilities of amino acids given by FastML are plotted as sequence logos in the WT Velcro frame and in five additional permutation frames corresponding to an intact structural blade that were tested experimentally. An asterisk () indicates the frame from which functional single motifs were obtained.
Figure S2
Figure S2
Correspondence of ELISA Signal to Binding Activity and the Variety of Functional Single Motifs, Related to Figure 3 (A) To calibrate the non-linear response of ELISA to the functional level of tachylectin-2, raw ELISA signals were measured from a purified WT tachylectin-2 concentration gradient. Data were fit to the sigmoidal function y=0.843/(1+(5.00/x)0.483) whereby y is the measured absorbance and x is tachylectin-2 concentration. (B) Single motifs containing the most probable ancestral substitutions (second round library, p > 0.1) were constructed as a mixture of plasmids (∼104 single motif sequences). This library was transformed into E. coli and sampled from the cell lysates of 500 randomly chosen clones. Raw ELISA absorbance was subtracted with the background of an empty plasmid’s expression lysate.
Figure S3
Figure S3
Native Propeller Formation and Stability, Related to Figures 3 and 5 (A) Size exclusion chromatography revealed monodisperse and overlapping elution profiles for tandem fusions and single motif constructs, indicating stable pentamerization of the latter even at low concentration. The upper AncB1B trace is at high concentration (12 μM propeller) and lower traces are at low concentrations (0.8 μM propeller). (B) Proteins showed similar propeller-like signatures by circular dichroism (CD). Shown are native CD spectra measured at 30°C. (C) Thermal unfolding was measured by CD (202 nm). In cases of unresolved baselines at high temperature, normalization to zero was assisted by measuring the residual function in ELISA.
Figure S4
Figure S4
Unfolding Parameters by Two-State Fitting, Related to Figure 5 Unfolding equilibria of constructs were again measured by incubating purified protein in GdmCl and measuring fluorescence spectra, as before (Figure 5C). However, in order to determine any dependence on the parameterization of curve fitting, the data in this case were first transformed to two states. At each denaturant concentration, fluorescence intensities at wavelengths corresponding to native and denatured states were taken as a ratio (IλN:IλD). Two-state unfolding models were fit for single motifs (A) AncA1B and (B) AncB1B, identically fused constructs (C) AncA5B and (D) Anc5V and (E) WTV, in the presence (filled circles) and absence (empty circles) of GlcNAc. Intermediate populations observed for identical fusions and WT (Figures 5C and 5D) manifest here as a less cooperative native-denatured unfolding transition with milder slope (lower pseudo-m value; Table S3).
Figure S5
Figure S5
Homology-Modeled Structural Domains of Genomic Sequences, Related to Figure 6 Sequences containing propeller motifs from (A–D) C. watsonii and (E–G) Frankia sp. strain EAN1pec were modeled with SwissProt using homologs from the pdb. For each panel, the overlap with the pdb sequence is shown by a gray box with the corresponding homology model shown below. (A and B) Single-propeller motifs are flanked by domains that do not form propellers, and accordingly, the homology model indicates non-propeller folds. One C. watsonii gene is addressed per row. The homology model for the single motif (denoted in green in the sequence diagram and typically modeled as four-stranded β sheet) is shown, alongside the predicted models for the flanking domains. (C and D) Genes composed of six propeller motifs are predicted to form six of the seven blades of a distant propeller homolog, suggesting that they close the radially arranged blades to form intact propellers. (E) An identified Frankia gene containing a single propeller motif. (F and G) Genes encoding multi-motif propellers. For all panels, Qmean4 scores (Benkert et al., 2009) given by Swiss-model (Biasini et al., 2014) indicate physical features of model quality and indicate that the predicted structures are most reliable in core structural regions, as shown by residue coloring using a Qmean4 heat map (blue = low to red = high). Uniprot accession codes are labeled.
Figure S6
Figure S6
Directionality of Emergence in Frankia Propeller Motifs, Related to Figure 6 In a scenario where a single motif was duplicated, fused, and gave a propeller, all propeller repeats would be equally diverged with respect to the single motif and with respect to one another. In the reverse scenario, however, in which a single motif from an existing propeller was duplicated and inserted into another protein, the propeller repeats would be equally diverged with respect to one another, but the single motif would resemble one repeat more than the others. To test this idea, average identities (black dots, as in Figure 6B) were compared against maximum identities (red dots) in a molecular clock plot. Overall, in agreement with the first scenario, the maximum and average identities hold the same trends, indicating that the single motif does not disproportionately resemble one repeat more than the others.

Comment in

References

    1. Afriat-Jurnou L., Jackson C.J., Tawfik D.S. Reconstructing a missing link in the evolution of a recently diverged phosphotriesterase by active-site loop remodeling. Biochemistry. 2012;51:6047–6055. - PubMed
    1. Ashkenazy H., Penn O., Doron-Faigenboim A., Cohen O., Cannarozzi G., Zomer O., Pupko T. FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids Res. 2012;40:W580–W584. - PMC - PubMed
    1. Balaji S. Internal symmetry in protein structures: prevalence, functional relevance and evolution. Curr. Opin. Struct. Biol. 2015;32:156–166. - PubMed
    1. Bar-Rogovsky H., Stern A., Penn O., Kobl I., Pupko T., Tawfik D.S. Assessing the prediction fidelity of ancestral reconstruction by a library approach. Protein Eng. Des. Sel. 2015;28:507–518. - PubMed
    1. Bershtein S., Mu W., Serohijos A.W., Zhou J., Shakhnovich E.I. Protein quality control acts on folding intermediates to shape the effects of mutations on organismal fitness. Mol. Cell. 2013;49:133–144. - PMC - PubMed

Publication types