Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 25;14(2):e0040823.
doi: 10.1128/mbio.00408-23. Epub 2023 Apr 5.

Exaptation of Inactivated Host Enzymes for Structural Roles in Orthopoxviruses and Novel Folds of Virus Proteins Revealed by Protein Structure Modeling

Affiliations

Exaptation of Inactivated Host Enzymes for Structural Roles in Orthopoxviruses and Novel Folds of Virus Proteins Revealed by Protein Structure Modeling

Pascal Mutz et al. mBio. .

Abstract

Viruses with large, double-stranded DNA genomes captured the majority of their genes from their hosts at different stages of evolution. The origins of many virus genes are readily detected through significant sequence similarity with cellular homologs. In particular, this is the case for virus enzymes, such as DNA and RNA polymerases or nucleotide kinases, that retain their catalytic activity after capture by an ancestral virus. However, a large fraction of virus genes have no readily detectable cellular homologs, meaning that their origins remain enigmatic. We explored the potential origins of such proteins that are encoded in the genomes of orthopoxviruses, a thoroughly studied virus genus that includes major human pathogens. To this end, we used AlphaFold2 to predict the structures of all 214 proteins that are encoded by orthopoxviruses. Among the proteins of unknown provenance, structure prediction yielded clear indications of origin for 14 of them and validated several inferences that were previously made via sequence analysis. A notable emerging trend is the exaptation of enzymes from cellular organisms for nonenzymatic, structural roles in virus reproduction that is accompanied by the disruption of catalytic sites and by an overall drastic divergence that precludes homology detection at the sequence level. Among the 16 orthopoxvirus proteins that were found to be inactivated enzyme derivatives are the poxvirus replication processivity factor A20, which is an inactivated NAD-dependent DNA ligase; the major core protein A3, which is an inactivated deubiquitinase; F11, which is an inactivated prolyl hydroxylase; and more similar cases. For nearly one-third of the orthopoxvirus virion proteins, no significantly similar structures were identified, suggesting exaptation with subsequent major structural rearrangement that yielded unique protein folds. IMPORTANCE Protein structures are more strongly conserved in evolution than are amino acid sequences. Comparative structural analysis is particularly important for inferring the origins of viral proteins that typically evolve at high rates. We used a powerful protein structure modeling method, namely, AlphaFold2, to model the structures of all orthopoxvirus proteins and compared them to all available protein structures. Multiple cases of recruitment of host enzymes for structural roles in viruses, accompanied by the disruption of catalytic sites, were discovered. However, many viral proteins appear to have evolved unique structural folds.

Keywords: AlphaFold2; exaptation; orthopoxviruses; protein structure analysis; virus evolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIG 1
FIG 1
Structural modeling validates cases of enzyme exaptation discovered through sequence similarity. (A) OPG56 (F12L) (blue, aa 205 to 634) and the best Dali hit, a DNA polymerase type-B from yeast (AF-P09804-F, green, aa 338 to 916). (B) Structural alignment of prototype OPG56 Q, query, seven OPG members from diverse chordopoxviruses (blue, from top to bottom: VARV, MPXV Zaire 96-I-16, VACV, SFV, MyxV, SORPV, and MCV subtype 1), and three top hits found by Dali (green, DNA polymerases type-B from Kluyveromyces lactis [af2-db P09804], Claviceps purpurea [af2-db P22373] and Bacillus virus phi29 [2py5-B] [83]). Alignment parts corresponding to the ExoI motif, polymerase motif C, and KxY motif are highlighted by a red box. The numbers indicate positions in the structural alignment. (C) OPG61 (F16L) (blue, aa 1 to 118 [of 231]) and the best cellular Dali hit, namely, the catalytic domain of a serin recombinase from Sulfolobus sp. L00 11 (archaea) (pdb 6dgc, [84] green, aa 65 to 164 [of 211]). The exemplified catalytic subdomain DRLXR (aa 139 to 143) in serin recombinase (magenta) and the mutated stretch KQISI (aa 73 to 77) in OPG61 (cyan) are highlighted. H, alpha-helix; E, beta sheet; L, loop. (D) The structural alignment of prototype OPG61 (top), seven OPG members from diverse chordopoxviruses (blue, ORPV species as indicated), and three Dali hits (green, an integrase from Lactococcus phage TP901-1 [3bvp-B] [85], an IS607-like serine recombinase from Sulfolobus sp. L00 11 [6dgc-D] [84]), and a resolvase family site-specific recombinase from Streptococcus pneumoniae SP19-BS75 (3guv-A [86]). Red bars highlight the catalytic centers (DRLxR motifs) of the serin recombinases.
FIG 2
FIG 2
Newly identified cases of enzyme exaptation for structural roles in poxviruses that are accompanied by disruption of the catalytic sites. In each panel (A–F), the left subpanel shows the superposition of the AlphaFold2 model of an OPG (blue) with a structurally similar cellular enzyme (green). Residues that are important for substrate binding and/or the catalytic activity of the cellular enzyme are highlighted in magenta (cellular enzyme), and the corresponding residues within OPG are shown in gray. The right subpanel shows the structural alignment of the respective query OPG (Q, top), seven OPG members from diverse chordopoxviruses (blue), and three structural homologs found by Dali (green). The proteins are listed from top to bottom. The catalytic and binding amino acid residues are highlighted in red. The numbers on top of the alignment refer to amino acid positions in that alignment. (A) Left panel: OPG55 (F11L) (aa 34–220) and human Lysyl Hydroxylase LH3 (6tex [doi:10.2210/pdb6tex/pdb], aa 545 to 738). Highlights: Residues that are known to bind Fe2+ and to be essential for the catalytic activity within 2-OG dioxygenase enzyme members (H667, D669 and H719) and OPG55 (L136, L138 and V184) are highlighted. Right panel: Structural alignment of OPG55 (Q), OPG55 from CMLV, VARV, MPXV Zaire-96-I-16, VACV, SwPV, SORPV ELK, and LSDV NI-2490, Dali hits (prolyl hydroxylase from Paramecium bursaria Chlorella virus 1 [5c5t-A] [87], PKHD-type hydroxylase from Psychrobacter sp. [af2-db A5WFM3] and human lysyl hydroxylase LH3 [6tex-A] [doi:10.2210/pdb6tex/pdb]. (B) Left panel: OPG181 (A51R) (aa 1 to 166) and Burkholderia pseudomallei oxidoreductase (6n1f [doi:10.2210/pdb6n1f/pdb]). Highlighted: Residues that are known to bind Fe2+ and to be essential for the catalytic activity within 2-OG dioxygenase enzyme members (H134, D136, H188); OPG181 (N100, F102 and F150). Right panel: Structural alignment of prototype OPG188 (Q), OPG188 from VARV, CMLV, MPXV Zaire 96-I-16, YLDV, SwPV, and LSDV NI-2490, Dali hits (oxidoreductase from Burkholderia pseudomallei [6n1f-B] [doi:10.2210/pdb6n1f/pdb]), Fe2OG dioxygenase domain-containing protein from Dictyostelium discoideum (af2-db Q54K28) and procollagen-proline 4-dioxygenase from Onchocerca volvulus (af2-db A0A2K6VMM0). (C) Left panel: OPG148 (A20R) (aa 28 to 284) and a DNA ligase B from Klebsiella pneumoniae (af-db B5XTF0, aa 61 to 406). Highlights: The key amino acids of motifs I (KxDG), IV (DG), and V (K) within the ligase adenylation domain appear in the structure from left to right. Right panel: Structural alignment of prototype OPG148 (Q), OPG148 from VACV, MPXV Zaire-96-I-16, VARV, MyxV, Orf virus, MCV subtype 1, and CRV), Dali hits (DNA ligases from Klebsiella pneumoniae [af2-db B5XTF0], E. coli [af2-db B7M4D2], and Streptococcus pneumoniae [af2-db B1IBQ3]). (D) Left panel: OPG129 (A3L) and human CYLD USP domain (2vhf [88], aa 583 to 955), a deubiquitinating enzyme. Highlights: Residues of the catalytic triad within the USP domain (C601, H871, D889); OPG129 (L136, L138 and V184). Right panel: Structural alignment of prototype OPG129 (Q), OPG129 from VACV, MyxV, VARV, MPXV Zaire-96-I-16, CRV, Orf virus, and SGVP), Dali hits (all CYLD USP domains found in Danio rerio [af2-db, E7F1X5], Homo sapiens [2vhf-B] [88], and Sporothrix schenckii [af2-db, U7Q4Z6]). (E) Left panel: OPG115 (D3R) and a kinesin motor ATPase from S. cerevisiae (1f9u [89], aa 385 to 722). Highlights: The P-loop (Walker A motif GxxxxGK(S/T)), Switch1 (SSRSH) and Switch2 (DLAGSE) motif within the ATPase. Right panel: Structural alignment of prototype OPG115 (Q), OPG115 from VACV, VARV, MPXV Zaire-96-I-16, MyxV, SFV, MCV subtype 1, and SOPV ELK, Dali hits (all Kinesins from S. cerevisiae [1f9u-A] [89], Homo sapiens [5lt4-D] [90] and Drosophila melanogaster [5hnz-K] [91]). (F) Left panel: OPG98 (L4R) and the best cellular Dali hit: Cholix toxin, a ADP-ribosyltransferase of V. cholerae (2q5t [92], aa 415 to 630). Highlights: The Cholix catalytic cluster (H460, Y493, Y504. E574, E581). Right panel: Structural alignment of prototype OPG98 (Q), OPG98 from VACV, VARV, MPXV Zaire-96-I-16, Orf virus, MyxV, MCV subtype 1, and CRV, and Dali hits (all ADP-ribosyltransferase toxins of: P. aeruginosa [af2-db: P11439] and V. cholerae [2q5t-A] [92] and 3ki7-A [doi:10.2210/pdb3ki7/pdb]. The residues of the catalytic cluster are highlighted in red. P11439 contains additional site that is highlighted in blue (S474).
FIG 3
FIG 3
Inactivated kinases and pseudokinases in orthopoxviruses. (A) Left panel: OPG97 (L3L) (blue, aa 66 to 350) and Haspin, an atypical Ser/Thr kinase (green, 6g37 [Heroven, 2018 number 2959], aa 472 to 798); (Mutated) ATP binding site, helix αC glutamate, and active site are highlighted (K511, E535, and D649 in Haspin, magenta; K93, E99, and E177 in OPG97, gray). Right panel: Structural alignment of prototype OPG97, seven OPG members from diverse chordopoxviruses and three Dali hits, all kinases. The Haspin specific ATP-binding motif DYT is highlighted in red. PDB structure: 6g37 (93). (B) Left: OPG198 (B12R) (blue) and human vaccinia-related kinase (VRK, 6cqh [doi:10.2210/pdb6cqh/pdb], green, aa 22 to 341). The ATP binding site and the active site are highlighted (K71 and D171 in VRK, magenta; K45 and K139 in OPG198). Right panel: Structural alignment prototype OPG198 (Q), seven OPG members from diverse chordopoxviruses and three Dali hits, all vaccina-related kinases. PDB structure: 6cqh [doi:10.2210/pdb6cqh/pdb]. (C) Left panel: OPG64 (E2L) (blue, aa 444 to 737) and the best cellular hit, namely, SidJ, a glutamylation protein with a pseudokinase-fold from Legionella pneumophila (7mis [33], green, aa 336–758). Key amino acids of the SidJ nucleotide-binding pocket (H492, R500, Y506, R522, N733 [orange]) and SidJ kinase-like active site (R352, K367, E373, E381, Y452, Y532, N534, D542 [magenta]) are shown. Right panel: Structural alignment of prototype OPG64 (Q), seven OPG members from diverse chordopoxviruses (blue) and three Dali hits (green). The residues that are important within SidJ for nucleotide binding (R522, orange) and kinase-like activity (Y532, red) are highlighted. PDB structures: 7mis (33), 7pqe (32), and 6oqq (94). (D) Superposition of OPG64 (purple) and OPG74 (O1L) (yellow). (E) Left panel: Pseudokinase domains of OPG74 (blue, aa 380 to 666) and SidJ (7mis [33], green, aa 336 to 758). The sites are highlighted as in panel C. Right panel: Structural alignment of prototype OPG74 (Q), seven OPG members from diverse chordopoxviruses (blue), and the same Dali hits as for OPG64 (green). The residues are highlighted as in panel C.
FIG 4
FIG 4
Newly identified cases of exaptation of nonenzymatic proteins for structural roles in poxviruses. Superimposition of OPG models (blue) over the best structural match (green), as identified by Dali. (A) OPG77 (I1L) and SWIB domain of mouse BRG1-associated factor 60a in Mus musculus (1uhr [doi:10.2210/pdb1uhr/pdb]). The putative SWIB domain in OPG77 (amino acid positions 138 to 222) is rendered in gray. The mouse SWIB domain contains 2 small antiparallel beta-sheets. (B) OPG82 (I6L) and ribosomal protein S6 (2j5a [75], Aquifex aeolicus). (C) OPG 127 (A2L) and the C-terminal region of transcription factor IIB (5wh1 [76], Homo sapiens). (D) OPG134 (A8R) and the C-terminal domain of transcription factor IIB (3h4c [78]). (E) OPG150 (A23R) and the TATA-binding protein (1mp9 [43], Sulfolobus acidocaldarius). (F) OPG185 (A56R) and nectin-1 (3u83 [95], Homo sapiens).
FIG 5
FIG 5
Inferred routes of evolution of orthopoxvirus proteins. The number of OPGs assigned to the different classes of virus proteins, with respect to the degree of functional change from the respective cellular ancestors, are shown. Black, virus hallmark proteins; blue, direct functional recruitment; blue-gray, “conservative” exaptation; opal, “radical” exaptation; shades of purple, unknown provenance. OPGs of unknown provenance were classified into disordered and generic (those that were predicted to adopt a globular fold but had no convincing match, with only generic matches [for example, to various β-sandwiches]) and PIE domains (predicted globular proteins with no match [mostly short proteins]) (see Table S1 for details).
FIG 6
FIG 6
Predicted unique folds of orthopoxvirus proteins. Predicted globular structures of ORPV proteins with no homologs detected outside poxviruses are shown. The coloring is according to the AlphaFold2 plddt-score, as shown in panel A. The experimentally resolved structures of the respective OPGs are shown in green. Weakly supported C-terminal domains are not shown for OPG95 (L1R) and OPG153 (A26L). (A) OPG27 (C7L) and VACV C7L (5cyw). (B) OPG95 (L1R) (aa 1 to 176 [of 250]) and VACV L1R (1ypy). (C) OPG153 (A26L) (aa 1 to 359 [of 518]). (D) OPG114 (D2L). (E) OPG112 (H7R) and VACV H7 (4w60). (F) OPG70 (E8R). (G) OPG132 (A6L) and VACV A6L (N-term 6cb6, C-term 6br9). (H) OPG163 (A35R). OPG27, 95, and 153 have homologs among other poxvirus OPGs (Fig. S4).

References

    1. Koonin EV, et al. 2020. Global organization and proposed megataxonomy of the virus world. Microbiol Mol Biol Rev 84. doi: 10.1128/MMBR.00061-19. - DOI - PMC - PubMed
    1. Koonin EV, Senkevich TG, Dolja VV. 2006. The ancient virus world and evolution of cells. Biol Direct 1:29. doi: 10.1186/1745-6150-1-29. - DOI - PMC - PubMed
    1. Krupovic M, Dolja VV, Koonin EV. 2019. Origin of viruses: primordial replicators recruiting capsids from hosts. Nat Rev Microbiol 17:449–458. doi: 10.1038/s41579-019-0205-6. - DOI - PubMed
    1. Jumper J, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. doi: 10.1038/s41586-021-03819-2. - DOI - PMC - PubMed
    1. Baek M, et al. 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373:871–876. doi: 10.1126/science.abj8754. - DOI - PMC - PubMed

Publication types

LinkOut - more resources