Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May:570:123-133.
doi: 10.1016/j.virol.2022.03.005. Epub 2022 Mar 30.

Genomic evolution of the Coronaviridae family

Affiliations

Genomic evolution of the Coronaviridae family

Christian M Zmasek et al. Virology. 2022 May.

Abstract

The current outbreak of coronavirus disease-2019 (COVID-19) caused by SARS-CoV-2 poses unparalleled challenges to global public health. SARS-CoV-2 is a Betacoronavirus, one of four genera belonging to the Coronaviridae subfamily Orthocoronavirinae. Coronaviridae, in turn, are members of the order Nidovirales, a group of enveloped, positive-stranded RNA viruses. Here we present a systematic phylogenetic and evolutionary study based on protein domain architecture, encompassing the entire proteomes of all Orthocoronavirinae, as well as other Nidovirales. This analysis has revealed that the genomic evolution of Nidovirales is associated with extensive gains and losses of protein domains. In Orthocoronavirinae, the sections of the genomes that show the largest divergence in protein domains are found in the proteins encoded in the amino-terminal end of the polyprotein (PP1ab), the spike protein (S), and many of the accessory proteins. The diversity among the accessory proteins is particularly striking, as each subgenus possesses a set of accessory proteins that is almost entirely specific to that subgenus. The only notable exception to this is ORF3b, which is present and orthologous over all Alphacoronaviruses. In contrast, the membrane protein (M), envelope small membrane protein (E), nucleoprotein (N), as well as proteins encoded in the central and carboxy-terminal end of PP1ab (such as the 3C-like protease, RNA-dependent RNA polymerase, and Helicase) show stable domain architectures across all Orthocoronavirinae. This comprehensive analysis of the Coronaviridae domain architecture has important implication for efforts to develop broadly cross-protective coronavirus vaccines.

Keywords: Coronaviridae; Evolution; Genome; Hidden Markov models; Nidovirales; Orthocoronavirinae; Phylogenetics; Phylogenomics; Protein domains.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Nidovirales taxonomy. This figure is based on the taxonomy established by the International Committee on Taxonomy of Viruses (ICTV) and currently used by the U.S. National Center for Biotechnology Information (NCBI) and the Universal Protein Resource (UniProt) databases. Viruses which infect humans are listed in blue (Alphacoronaviruses) and red (Betacoronaviruses). Their taxonomic level is indicated in square brackets. For some viruses, no taxonomic level has been established as of this writing. An example of this is Human coronavirus OC43.
Fig. 2
Fig. 2
Coronaviridae genome organization. SARS-CoV-2 genome organization. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome is shown as an example of the Orthocoronavirinae genome organization. The abbreviations used are: pp: polyprotein, PL-pro: Papain-like protease, 3CL-pro: Cysteine protease, RdRP: RNA dependent RNA polymerase, Hel: Helicase, S: Spike protein, E: Envelope protein, M: Membrane protein, N: Nucleocapsid protein. Genome organization of human Orthocoronavirinae accessory proteins. The ORF-based names are shown, together with additional names and the corresponding Pfam domain. Note that the ORF-based names do not always match across taxonomic groups. For example, ORF5a in OC43 appears to be an ortholog of ORF4 in HKU1 given their conserved Pfam domain architecture. For two of the Merbecovirus accessory proteins for which no Pfam model exists, new hidden Markov models were developed (see Methods). These are labelled in italic fonts.
Fig. 3
Fig. 3
Domain gains and losses during Nidovirales evolution. Gained Pfam domains are shown in green, whereas lost domains are shown in red, as inferred by Dollo parsimony. For members of suborder Tornidovirineae only select examples are shown (due to limited genome and Pfam HMM data availability). Data for subgenera is not shown. Detailed lists of gained and lost domain are available in Supplementary Tables 1 and 2.
Fig. 4
Fig. 4
Phylogeny and domain organization of Coronaviridae spike glycoproteins. The phylogeny on the left side was calculated using a maximum likelihood approach applied to a MAFFT alignment of the CoV_S2 domains. Spike protein domain architecture for each genus is shown in the middle; for a description of the Pfam domains see Table 1. Host cell receptors and likely additional receptors are shown on the right side (Graham et al., 2013). The following abbreviations are used: ACE2, angiotensin converting enzyme 2; APN, aminopeptidase N; CEACAM1a, carcinoembryonic cell adhesion molecule 1a; DC-SIGN, dendritic cell-specific ICAM-grabbing non-integrin; DC-SIGNR, DC-SIGN-related protein; DPP4, dipeptidyl peptidase 4; LSECtin, liver and lymph node sinusoidal C-type lectin.
Fig. 5
Fig. 5
Arrangement of protein domains in polyproteins. Domains matching with a E-value of less than 0.001 are shown. Domains for which the E-values are larger than 0.001, or which are not present in all genomes of a given subgenus, are labelled in grey. In order to align corresponding domains, we introduced artificial insertions, shown with dashed lines. Domains making up the Papain-like peptidase are marked with a light grey box. Domains are not drawn to scale. For details of domains see Table 2.

References

    1. Altenhoff A.M., Studer R.A., Robinson-Rechavi M., Dessimoz C. Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput. Biol. 2012;8 doi: 10.1371/journal.pcbi.1002514. - DOI - PMC - PubMed
    1. Andersen K.G., Rambaut A., Lipkin W.I., Holmes E.C., Garry R.F. The proximal origin of SARS-CoV-2. Nat. Med. 2020 doi: 10.1038/s41591-020-0820-9. - DOI - PMC - PubMed
    1. Bateman A., Martin M.J., O'Donovan C., Magrane M., Alpi E., Antunes R., Bely B., Bingley M., Bonilla C., Britto R., Bursteinas B., Bye-AJee H., Cowley A., Da Silva A., De Giorgi M., Dogan T., Fazzini F., Castro L.G., Figueira L., Garmiri P., Georghiou G., Gonzalez D., Hatton-Ellis E., Li W., Liu W., Lopez R., Luo J., Lussi Y., MacDougall A., Nightingale A., Palka B., Pichler K., Poggioli D., Pundir S., Pureza L., Qi G., Rosanoff S., Saidi R., Sawford T., Shypitsyna A., Speretta E., Turner E., Tyagi N., Volynkin V., Wardell T., Warner K., Watkins X., Zaru R., Zellner H., Xenarios I., Bougueleret L., Bridge A., Poux S., Redaschi N., Aimo L., ArgoudPuy G., Auchincloss A., Axelsen K., Bansal P., Baratin D., Blatter M.C., Boeckmann B., Bolleman J., Boutet E., Breuza L., Casal-Casas C., De Castro E., Coudert E., Cuche B., Doche M., Dornevil D., Duvaud S., Estreicher A., Famiglietti L., Feuermann M., Gasteiger E., Gehant S., Gerritsen V., Gos A., Gruaz-Gumowski N., Hinz U., Hulo C., Jungo F., Keller G., Lara V., Lemercier P., Lieberherr D., Lombardot T., Martin X., Masson P., Morgat A., Neto T., Nouspikel N., Paesano S., Pedruzzi I., Pilbout S., Pozzato M., Pruess M., Rivoire C., Roechert B., Schneider M., Sigrist C., Sonesson K., Staehli S., Stutz A., Sundaram S., Tognolli M., Verbregue L., Veuthey A.L., Wu C.H., Arighi C.N., Arminski L., Chen C., Chen Y., Garavelli J.S., Huang H., Laiho K., McGarvey P., Natale D.A., Ross K., Vinayaka C.R., Wang Q., Wang Y., Yeh L.S., Zhang J. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017;45:D158–D169. doi: 10.1093/nar/gkw1099. - DOI - PMC - PubMed
    1. Bodeló G., Labrada L., Martínez-Costas J., Benavente J. 2002. Modification of Late Membrane Permeability in Avian Reovirus-Infected Cells VIROPORIN ACTIVITY of the S1-ENCODED NONSTRUCTURAL P10 PROTEIN*. - DOI - PubMed
    1. Chen L., Li F. Structural analysis of the evolutionary origins of Influenza virus Hemagglutinin and other viral lectins. J. Virol. 2013;87:4118–4120. doi: 10.1128/jvi.03476-12. - DOI - PMC - PubMed

Publication types

Substances