Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Apr;53(4):537-547.
doi: 10.1038/s12276-021-00604-z. Epub 2021 Apr 16.

On the origin and evolution of SARS-CoV-2

Affiliations
Review

On the origin and evolution of SARS-CoV-2

Devika Singh et al. Exp Mol Med. 2021 Apr.

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the ongoing global outbreak of a coronavirus disease (herein referred to as COVID-19). Other viruses in the same phylogenetic group have been responsible for previous regional outbreaks, including SARS and MERS. SARS-CoV-2 has a zoonotic origin, similar to the causative viruses of these previous outbreaks. The repetitive introduction of animal viruses into human populations resulting in disease outbreaks suggests that similar future epidemics are inevitable. Therefore, understanding the molecular origin and ongoing evolution of SARS-CoV-2 will provide critical insights for preparing for and preventing future outbreaks. A key feature of SARS-CoV-2 is its propensity for genetic recombination across host species boundaries. Consequently, the genome of SARS-CoV-2 harbors signatures of multiple recombination events, likely encompassing multiple species and broad geographic regions. Other regions of the SARS-CoV-2 genome show the impact of purifying selection. The spike (S) protein of SARS-CoV-2, which enables the virus to enter host cells, exhibits signatures of both purifying selection and ancestral recombination events, leading to an effective S protein capable of infecting human and many other mammalian cells. The global spread and explosive growth of the SARS-CoV-2 population (within human hosts) has contributed additional mutational variability into this genome, increasing opportunities for future recombination.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Phylogenetic background and genomic structure of SARS-CoV-2.
a Schematic depiction of the four genera of coronaviruses, their evolutionary relationship, and their animal hosts. The phylogenetic relationships established by Woo et al. were used to draw the figure. b Genomic distribution of all open reading frames (ORFs) across the 29,903 bp SARS-CoV-2 genome. The nucleocapsid (N), spike (S), membrane (M), and envelope (E) proteins are color-coded according to the image of the virus. All other ORFs correspond to nonstructural proteins. The yellow panel shows an enhanced view of an 8,340 bp region encompassing 9 ORFs and the three-prime untranslated region (3′-UTR).
Fig. 2
Fig. 2. Recombination events in the history of SARS-CoV-2.
a Variations in the sequence relatedness of different regions of SARS-CoV-2 in comparison with alternative strains of coronaviruses from pangolins (Pangolin Guangdong 2019), bats (RaTG13, Bat-SL-CoV, Rs3367), palm civets (PC4-13), and humans (Tor2). In region 1 (R1), region 2 (R2), and region 4 (R4), SARS-CoV-2 is most similar to the corresponding regions of the bat coronavirus RaTG13. In region 3 (R3), SARS-CoV-2 and the pangolin strain of coronavirus (Pangolin Guangdong 2019) are more closely related. The pangolin strain consistently clusters within bat coronavirus clades. For regions 1, 2, and 3, phylogenetic relationships were obtained from Lam et al.; for region 4, phylogenetic relationships were obtained from Boni et al.. Regions are colored based on their genomic position in the SARS-CoV-2 genome model (top panel). b Two scenarios hypothesizing the evolutionary timing of the recombination event that may have introduced the pangolin coronavirus sequence (region 3/R3) into SARS-CoV-2. In scenario I, after the divergence of SARS-CoV-2 and RaTG13, recombination between SARS-CoV-2 and Pangolin Guangdong 2019 resulted in the acquisition of the new sequence. In scenario II, recombination occurred between the common ancestral lineage of SARS-CoV-2 and RaTG13 and the pangolin (Pangolin Guangdong 2019) lineage, followed by the accumulation of mutations in the RaTG13 lineage.
Fig. 3
Fig. 3. Structural comparison of coronavirus spike proteins.
a Long-axis trimer, closed conformation view of the cryo-EM spike protein structure from the pangolin coronavirus (PDB ID: 7CN8, left panel), human SARS-CoV-2 (PDB ID: 6ZB5, middle panel), and bat RaTG13 coronavirus (PDB ID: 7CN4, right panel). Models are rainbow colored from the N-terminus (blue) to the C-terminus (red). b Left panel depicts human SARS-CoV-2 (colored by chain: purple, yellow, and green) and the ACE2 complex (colored red) in bound confirmation (PDB ID: 6ACG). The right panel shows a magnified region encompassing 8 amino acids (positions shown in blue, green, and red) detected as targets of positive selection in the previous studies,,. The positions in blue were identified as positively selected in one of the three cited studies, while the positions depicted in red (493 and 494) were identified in two studies. The green position 483 of the S protein was identified as positively selected in all three studies. For a, b all models were visualized by SWISS-MODEL.
Fig. 4
Fig. 4. Demographics of SARS-CoV-2.
a Top panel shows a phylogenetic tree of 3852 SARS-CoV-2 genomes sampled globally between December 2019 and March 2021. The bottom panel shows the geographic distribution of the major clades of SARS-CoV-2. Clades were defined using Nextstrain nomenclature based on global frequency, variation from parent clade, and year of emergence. The relative global frequency of b all major SARS-CoV-2 clades, c the amino acid variant at position 614 of the spike protein (D: aspartic acid and G: glycine), d the amino acid variant at position 452 of the spike protein (L: leucine and R: arginine), and e the amino acid variant at position 501 of the spike protein (N: asparagine, Y: tyrosine and T: threonine). For a, b clades were named according to Nextstrain nomenclature, which distinguishes clades based on global frequency, year of emergence and a unique letter. For ae data visualization was performed by nextstrain.org with data provided by GISAID.

Similar articles

Cited by

References

    1. Dudas G, Carvalho LM, Rambaut A, Bedford T. MERS-CoV spillover at the camel-human interface. Elife. 2018;7:e31257. - PMC - PubMed
    1. Corman VM, Muth D, Niemeyer D, Drosten C. Hosts and sources of endemic human coronaviruses. Adv. Virus Res. 2018;100:163–188. - PMC - PubMed
    1. Wertheim JO, Chu DKW, Peiris JSM, Kosakovsky Pond SL, Poon LLM. A case for the ancient origin of coronaviruses. J. Virol. 2013;87:7039. - PMC - PubMed
    1. Woo PCY, et al. Discovery of seven novel mammalian and avian coronaviruses in the genus Deltacoronavirus supports bat coronaviruses as the gene source of Alphacoronavirus, Betacoronavirus and Avian Coronaviruses as the gene source of Gammacoronavirus, Deltacoronavirus. J. Virol. 2012;86:3995. - PMC - PubMed
    1. Jonassen CM, et al. Molecular identification and characterization of novel coronaviruses infecting graylag geese (Anser anser), feral pigeons (Columbia livia) and mallards (Anas platyrhynchos) J. Gen. Virol. 2005;86:1597–1607. - PubMed

Publication types

Substances

LinkOut - more resources