Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Jul 1:2023.06.29.547063.
doi: 10.1101/2023.06.29.547063.

The importance of input sequence set to consensus-derived proteins and their relationship to reconstructed ancestral proteins

Affiliations

The importance of input sequence set to consensus-derived proteins and their relationship to reconstructed ancestral proteins

Charlotte Nixon et al. bioRxiv. .

Update in

Abstract

A protein sequence encodes its energy landscape - all the accessible conformations, energetics, and dynamics. The evolutionary relationship between sequence and landscape can be probed phylogenetically by compiling a multiple sequence alignment of homologous sequences and generating common ancestors via Ancestral Sequence Reconstruction or a consensus protein containing the most common amino acid at each position. Both ancestral and consensus proteins are often more stable than their extant homologs - questioning the differences and suggesting that both approaches serve as general methods to engineer thermostability. We used the Ribonuclease H family to compare these approaches and evaluate how the evolutionary relationship of the input sequences affects the properties of the resulting consensus protein. While the overall consensus protein is structured and active, it neither shows properties of a well-folded protein nor has enhanced stability. In contrast, the consensus protein derived from a phylogenetically-restricted region is significantly more stable and cooperatively folded, suggesting that cooperativity may be encoded by different mechanisms in separate clades and lost when too many diverse clades are combined to generate a consensus protein. To explore this, we compared pairwise covariance scores using a Potts formalism as well as higher-order couplings using singular value decomposition (SVD). We find the SVD coordinates of a stable consensus sequence are close to coordinates of the analogous ancestor sequence and its descendants, whereas the unstable consensus sequences are outliers in SVD space.

Keywords: Ancestral Sequence Reconstruction; Consensus Design; Protein Folding; Protein Stability; Singular Value Decomposition.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Sequence comparisons of extant, ancestors and consensus RNases H.
(A). MSA of the different consensus, ancestral, and extant RNases H. Alignments were generated with Clustal Omega and shaded by similarity. (B). RNase H phylogenetic tree with characterized ancestors highlighted as stars. Designed consensus proteins are displayed as brackets showing the input extant sequence sets, with the number of input sequences in parentheses. (C). Percent identity calculated by Clustal Omega between consensus and selected ancestral and extant RNases H. (DispersedCons* abbreviated DispCons*).
Figure 1:
Figure 1:. Sequence comparisons of extant, ancestors and consensus RNases H.
(A). MSA of the different consensus, ancestral, and extant RNases H. Alignments were generated with Clustal Omega and shaded by similarity. (B). RNase H phylogenetic tree with characterized ancestors highlighted as stars. Designed consensus proteins are displayed as brackets showing the input extant sequence sets, with the number of input sequences in parentheses. (C). Percent identity calculated by Clustal Omega between consensus and selected ancestral and extant RNases H. (DispersedCons* abbreviated DispCons*).
Figure 2:
Figure 2:. Consensus proteins adopt the RNase H fold and are active.
(A). CD spectra and (B). RNase H activity assays for consensus RNase H constructs compared to extant and ancestral proteins. ecRNH D10A* is a catalytically-inactive RNase H variant.
Figure 3:
Figure 3:. Thermal and chemical denaturation of RNase H variants.
(A). Thermal denaturation of WholeCons* and MegaCons* compared to ecRNH, ttRNH, and Anc123. (B). Urea denaturation of WholeCons* and MegaCons* compared to ecRNH*, ttRNH*, and Anc1*. (C). Thermal denaturation of Anc1cons, AncAcons, AncCcons*, and DispersedCons* compared to ecRNH, ttRNH, and Anc1 and (D). Urea denaturation of AncCcons*, and DispersedCons* compared to ecRNH*, ttRNH*, and Anc1*. All unfolding transitions are monitored by CD spectroscopy at 222 nm.
Figure 4:
Figure 4:. WholeCons* shows non-coincident denaturation curves.
CD at 222 nm (filled circles) and tryptophan fluorescence (open circles) (pH 5.5, 25 °C).
Figure 5:
Figure 5:. Refolding of WholeCons*.
(A) Stopped-flow refolding of WholeCons where almost the entire signal change takes place in the dead time. (B) E. coli RNase H structure (PDB ID: 2RN2) adapted from Hu et al. PNAS. 2013, colored by core (blue, yellow, green) and periphery (red). (C) Pulse-labeling hydrogen-deuterium exchange refolding studies monitored by mass spectrometry (HXMS) of WholeCons*. Fraction deuterated is plotted for each residue at different refolding times. Residues in black are site-resolved, residues in grey are not resolved from their neighbors, and residues marked as “x” had insufficient peptide coverage to derive a fraction deuterated. The secondary structural elements of ecRNH core (blue, yellow, green) and periphery (red) are indicated above.
Figure 6.
Figure 6.
Summary of melting temperatures for consensus proteins compared to ancestral and extant RNases H.
Figure 7.
Figure 7.. Kinetics of AncCcons* folding.
(A). Refolding of AncCcons* at 4M Urea monitored by CD at 222 nm, showing single exponential decay. (B). Chevron plot of AncCcons* compared to fitted cures for AncC* and ecRNH*.
Figure 8.
Figure 8.. Potts analysis of RNaseH sequences.
Intrinsic and pairwise coupling coefficients were determined from an MSA with 11,300 sequences, and were used to calculate total intrinsic (Hseq, A) and pairwise coupling scores (Jseq, B) for consensus (red), ancestral (black), and extant (blue) sequences. Neither score correlates strongly with overall stability, as represented with Tm values. (C, D) Histograms of Hseq and Jseq scores for the 11,300 extant sequences in the alignment, along with the values from the consensus derived from those 11,300 sequences (red lines).
Figure 9
Figure 9. RNase H sequences in SVD space.
409 aligned RNase H sequences from Hart et al were represented as a binary matrix, which was transformed into an orthogonal sequence and residue matrices using singular value decomposition. Plots show sequences (small points) plotted in the first three dimensions of SVD space (left), and in dimensions 2 and 3 (right). The first dimension (σ1ui(1)), shown only in the three-dimensional plots, reflects overall conservation, whereas the second and third dimensions reflect covariance patterns among sequences. (A) MSA sequences used in the SVD are colored red, black, blue, green, and yellow based on k-means clustering into five groups. Ancestral sequences projected into SVD space are colored violet, and consensus sequences created from ancestral descendant sequences are colored brown. (B-D) Descendants of the main ancestors investigated here (Anc1, AncA, and AncC, respectively) are plotted as large grey spheres. Descendants of AncC are also descendants of AncA (and Anc1), and those of AncA are also descendants of Anc1, resulting in considerable overlap. (E) Non-overlapping descendants of major ancestral branchpoints, plotted as large colored spheres. The python scripts that generated the binary encoding, SVD, and plots are available as a jupyter notebook at https://github.com/barricklab-at-jhu/SVD-of-MSAs/tree/main/RNaseH“.

References

    1. Magliery TJ (2015) Protein stability: Computation, sequence statistics, and new experimental methods. Curr. Opin. Struct. Biol. 33:161–168. - PMC - PubMed
    1. Porebski BT, Buckle AM (2016) Consensus protein design. Protein Eng. Des. Sel. 29:245–251. - PMC - PubMed
    1. Thornton JW, Need E, Crews D (2003) Resurrecting the ancestral steroid receptor: Ancient origin of estrogen signaling. Science (80-. ). 301:1714–1717. - PubMed
    1. Wilson C, Agafonov R V., Hoemberger, Kutter S, Zorba A, Halpin J, Buosi V, Otten R, Waterman D, Theobald DL, et al. (2015) Using ancient protein kinases to unravel a modern cancer drug’s mechanism. Science (80-. ). 347:882–886. - PMC - PubMed
    1. Anderson JA, Loes AN, Waddell GL, Harms MJ (2019) Tracing the evolution of novel features of human Toll-like receptor 4. Protein Sci. 28:1350–1358. - PMC - PubMed

Publication types