Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021;19(2):769-785.
doi: 10.1007/s10311-020-01151-1. Epub 2021 Feb 4.

Tracing the origins of SARS-COV-2 in coronavirus phylogenies: a review

Affiliations
Review

Tracing the origins of SARS-COV-2 in coronavirus phylogenies: a review

Erwan Sallard et al. Environ Chem Lett. 2021.

Abstract

SARS-CoV-2 is a new human coronavirus (CoV), which emerged in China in late 2019 and is responsible for the global COVID-19 pandemic that caused more than 97 million infections and 2 million deaths in 12 months. Understanding the origin of this virus is an important issue, and it is necessary to determine the mechanisms of viral dissemination in order to contain future epidemics. Based on phylogenetic inferences, sequence analysis and structure-function relationships of coronavirus proteins, informed by the knowledge currently available on the virus, we discuss the different scenarios on the origin-natural or synthetic-of the virus. The data currently available are not sufficient to firmly assert whether SARS-CoV2 results from a zoonotic emergence or from an accidental escape of a laboratory strain. This question needs to be solved because it has important consequences on the risk/benefit balance of our interactions with ecosystems, on intensive breeding of wild and domestic animals, on some laboratory practices and on scientific policy and biosafety regulations. Regardless of COVID-19 origin, studying the evolution of the molecular mechanisms involved in the emergence of pandemic viruses is essential to develop therapeutic and vaccine strategies and to prevent future zoonoses. This article is a translation and update of a French article published in Médecine/Sciences, August/September 2020 (10.1051/medsci/2020123).

Supplementary information: The online version of this article (10.1007/s10311-020-01151-1) contains supplementary material, which is available to authorized users.

Keywords: Bioinformatics; Biosafety; Coronavirus; Covid-19; Furin; Gain of function; Genome analysis; Pandemic; Phylogeny; SARS-CoV-2; Spike protein; Virology; Zoonosis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Phylogeny and emergence of coronaviruses. a Tree inferred from complete coronavirus genomes, based on multiple alignment (clustalw) followed by maximum likelihood inference (PhyML). Genomes assembled from metagenomic data are marked with a star. The prefixes of virus names indicate the host species: Bt (bat), Hu (human), Pn (pangolin), Cv (civet), Cm (camel), Pi (pig). Note that the distances between HuCoV2 and the closest viral strains (BrYuRmYN02, BtRaTG13) are higher than for SARS-CoV (human–civet) or MERS-CoV (human–camel). b–d Hypotheses of transmission from the animal reservoir (bats) to humans, based on the molecular phylogeny of viral genomes. b For the SARS-CoV pandemic of 2003, the civet has been proposed as intermediate host. Direct bat-to-human transmission is also under consideration. c Pandemic MERS-CoV of 2012, with the camel as an intermediate host. Several direct transmission events have been documented. d COVID-19 pandemic. Several scenarios are proposed about the last host before transmission to humans. Distances between HuCoV2 and the closest viral strains are found to be greater than for SARS-CoV (human–civet) or MERS-CoV (human–camel)
Fig. 2
Fig. 2
Profiles of Percent Identical Positions (PIP) between SARS-CoV-2 and other coronavirus genomic sequences. a Genome-wide PIP profile (with sliding windows of 800 base pairs). b PIP profile along the S gene (200 bp sliding windows). c–e Impact of recombinations on the topology of phylogenetic trees inferred from different genomic regions: ORF1ab (c), S1 (d) and RBD (e)
Fig. 3
Fig. 3
Structure and function of the spike protein (S protein). a SARS-CoV-2 S protein specifically recognizes the ACE2 receptor of the host cells and thereby starts the infection cycle. b The S protein undergoes 2 maturation steps by proteolytic cleavage (respectively catalyzed by the furin and the TMPRSS2 proteins), which are required to activate the protein and to unlock the fusion peptide. c Structure of the viral S protein bound to the host ACE2 receptor. The SARS-CoV-2 S protein structure (beige) was produced by running SWISS-MODEL on the SARS-CoV homolog (Protein Data Bank entry 6acc) and aligned on the structure of an RBD domain (orange) interacting with ACE2 (gray) from the PDB model 6m0j. The SARS-CoV-2 insertions are highlighted in colors, with a coloring scale reflecting the taxonomic scope of the insertion: red (only found in human SARS-CoV-2, yellow, green, blue and purple (insertion found in most sarbecoviruses)
Fig. 4
Fig. 4
Conservation of ACE2 proteins and interactions with the viral S protein. a Interactions between ACE2 and S and conservation of the key residues [adapted from Wang et al. (2020), Yan et al. (2020)] in different viral strains and animal species. The key interactions between S and ACE2 residues are denoted by solid lines, and weaker interactions by dotted lines. b Number of differences between human ACE2 and its ortholog in several animal species for the key residues involved in the interactions with the S protein. [adapted from Yan et al. (2020)]
Fig. 5
Fig. 5
Taxonomic coverage of the insertions observed in SARS-CoV-2 S protein. Each panel shows multiple alignments of amino acid sequences around the insertion (left) and the likely occurrence of the evolutionary event on the phylogenetic trees inferred from the amino acid sequences surrounding the insertions (right). The insertions, respectively, cover the positions 153–158 (a), 245–251 (b), 445–449, (c) and 680–683 (d) of SARS-CoV-2 S protein. The schema on the top of the panels indicates the respective positions of the four insertions. Except for insertion i3b, the sequences sharing a same insertion appear grouped in the phylogenetic tree, suggesting a distinct origin for each insertion. The deep difference between tree topologies indicates that these regions of insertions result from different evolutionary stories. The values on the bifurcations denote the bootstrap score (on a scale from 0 to 100), which indicate the robustness of the corresponding branching. A weak bootstrap value (< 50) means that the corresponding branching has a weak reliability. Note that the weak values are often attached to BtYuRmYN02, which results from the metagenomic assembly of a large number of samples for various sources. Consistently, this metagenome is strongly inconsistent between the different aligned fragments, which questions its biological relevance
Fig. 6
Fig. 6
Matches between S gene and HIV genome. a Top-ranking alignment between the S gene and the HIV genome. b Top-ranking alignment between the randomized query sequence (shuffled nucleotides) and the HIV genome. Note the value of the expect score, which indicates the number of false positives expected by chance. The comparison shows that the alignment between the coding sequence of S protein and the HIV genome is not significant, since the expect score is higher than 1, and even higher for the actual gene than for a random sequence. The alignments were performed on NCBI BLAST server (https://blast.ncbi.nlm.nih.gov/Blast.cgi)

Similar articles

Cited by

References

    1. Andersen KG, Andrew Rambaut W, Lipkin I, Holmes EC, Garry RF. The proximal origin of SARS-CoV-2. Nat Med. 2020;26(4):450–452. doi: 10.1038/s41591-020-0820-9. - DOI - PMC - PubMed
    1. Belouzard S, Chu VC, Whittaker GR. Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites. Proc Natl Acad Sci USA. 2009;106(14):5871–5876. doi: 10.1073/pnas.0809524106. - DOI - PMC - PubMed
    1. Burki T. Ban on gain-of-function studies ends. Lancet Infectious Diseases. 2018;18(2):148–149. doi: 10.1016/S1473-3099(18)30006-9. - DOI - PMC - PubMed
    1. Calisher C, Carroll D, Colwell R, Corley RB, Daszak P, Drosten C, Enjuanes L, et al. Statement in support of the scientists, public health professionals, and medical professionals of China combatting COVID-19. Lancet. 2020;395(10226):e42–e43. doi: 10.1016/S0140-6736(20)30418-9. - DOI - PMC - PubMed
    1. Casane D, Policarpo M, Laurenti P. Pourquoi le taux de mutation n’est-il jamais égal à zéro ? Médecine/Sciences. 2019;35(3):245–251. doi: 10.1051/medsci/2019030. - DOI - PubMed

LinkOut - more resources