Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 4;14(2):evac018.
doi: 10.1093/gbe/evac018.

Exploring the Natural Origins of SARS-CoV-2 in the Light of Recombination

Affiliations

Exploring the Natural Origins of SARS-CoV-2 in the Light of Recombination

Spyros Lytras et al. Genome Biol Evol. .

Abstract

The lack of an identifiable intermediate host species for the proximal animal ancestor of SARS-CoV-2, and the large geographical distance between Wuhan and where the closest evolutionary related coronaviruses circulating in horseshoe bats (members of the Sarbecovirus subgenus) have been identified, is fueling speculation on the natural origins of SARS-CoV-2. We performed a comprehensive phylogenetic study on SARS-CoV-2 and all the related bat and pangolin sarbecoviruses sampled so far. Determining the likely recombination events reveals a highly reticulate evolutionary history within this group of coronaviruses. Distribution of the inferred recombination events is nonrandom with evidence that Spike, the main target for humoral immunity, is beside a recombination hotspot likely driving antigenic shift events in the ancestry of bat sarbecoviruses. Coupled with the geographic ranges of their hosts and the sampling locations, across southern China, and into Southeast Asia, we confirm that horseshoe bats, Rhinolophus, are the likely reservoir species for the SARS-CoV-2 progenitor. By tracing the recombinant sequence patterns, we conclude that there has been relatively recent geographic movement and cocirculation of these viruses' ancestors, extending across their bat host ranges in China and Southeast Asia over the last 100 years. We confirm that a direct proximal ancestor to SARS-CoV-2 has not yet been sampled, since the closest known relatives collected in Yunnan shared a common ancestor with SARS-CoV-2 approximately 40 years ago. Our analysis highlights the need for dramatically more wildlife sampling to: 1) pinpoint the exact origins of SARS-CoV-2's animal progenitor, 2) the intermediate species that facilitated transmission from bats to humans (if there is one), and 3) survey the extent of the diversity in the related sarbecoviruses' phylogeny that present high risk for future spillovers.

Keywords: Rhinolophus; Sarbecoviruses; COVID-19; SARS-CoV-2; bats; coronaviruses; host range; origin; pangolins; recombination.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Recombination-minimized phylogeny and recombination hot-/coldspots. Maximum likelihood phylogeny inferred from a recombination-free whole-genome alignment of the 78 Sarbecoviruses (A), see Materials and Methods. The non-nCoV/SARS-CoV clade is collapsed for clarity. All nodes presented have bootstrap confidence values above 90%. Distribution of recombination hot- and coldspots across the alignment based on the RRT (B) and the BDT (C) methods. For both plots, light and dark gray represent 95% and 99% confidence intervals of expected recombination breakpoint clustering under random recombination. Peaks above the shaded area represent recombination hotspots and drops below represent coldspots, annotated on the corresponding ORF genome schematic above each plot by vertical red and blue lines, respectively. All ORF names and the NTD and RBD encoding regions of Spike are also annotated on the schematics.
Fig. 2
Fig. 2
Nonrecombinant topologies of SARS-CoV-2 relatives. Zoomed in regions of selected RBP region maximum likelihood phylogenies (A). Branches within the nCoV clade are colored in red and outside the nCoV clade in green. Genome schematics of close SARS-CoV-2 relatives with recombinant Spike regions (B). RBP regions 15 and 16 are highlighted and the non-nCoV subclades of the maximum likelihood phylogenies containing the relevant viruses are presented. The coloring of nonrecombinant segments indicates patristic distance to SARS-CoV-2 (see fig. 3 legend). Nodes with bootstrap confidence values below 80% have been collapsed.
Fig. 3
Fig. 3
Recombination analysis and geographic distribution of Sarbecoviruses. Maximum clade credibility (MCC) dated phylogeny of RBP region 5 of 78 Sarbecoviruses (A). All tips are annotated with the geographic region the viruses have been sampled in and notable viruses are annotated with genome schematics separated into the 22 inferred RBP regions, each colored based on phylogenetic distance from SARS-CoV-2 (see scale and Materials and Methods). RBP region 21 has been removed from the schematic due to limited phylogenetic information in the alignment. The GX cluster annotated with an asterisk contains the five pangolin coronaviruses collected in Guangxi. Map of East Asia with geographic regions (provinces within China, countries outside China) colored based on Sarbecoviruses sampling (B): blue for regions with only non-nCoV clade samples, pink for regions where nCoV viruses have been sampled. Shading in the nCoV regions corresponds to phylogenetic distance from SARS-CoV-2 (see scale). Notable nCoV viruses and pangolin trafficking routes (adapted from Xu et al. [2016]) are annotated onto the map.
Fig. 4
Fig. 4
Molecular dating and Rhinolophus host geographic distributions. Tip-dated Bayesian phylogeny of RBP region 5 showing the nine closest relatives to SARS-CoV-2 (A). Tree nodes have been adjusted to the mean age estimates and posterior distributions are shown for each node with mean age estimate and 95% HPD confidence intervals presented to their left. Tips are annotated with the host species they were sampled in, bat silhouette colors correspond to panel (B). Geographic ranges of Rhinolophus species the SARS-CoV-2 closest relatives have been sampled in (B). Maps are restricted to East Asia and separated into province-level within China and country-level outside China.

References

    1. Akaike H. 1998. Information theory and an extension of the maximum likelihood principle. In: Parzen E, Tanabe K, Kitagawa G, editors. Selected Papers of Hirotugu Akaike. Springer Series in Statistics (Perspectives in Statistics). New York: Springer. p. 199–213.
    1. Bates P, Bumrungsri S, Csorba G, Soisook P.. 2019. Rhinolophus malayanus. IUCN Red List Threat. Species 2019. [Internet]:e.T19551A21978424. International Union for Conservation of Nature and Natural Resources (IUCN). Available from: https://www.iucnredlist.org/species/19551/21978424.
    1. Bobay LM, O’Donnell AC, Ochman H.. 2020. Recombination events are concentrated in the spike protein region of Betacoronaviruses. PLoS Genet. 16(12):e1009272. - PMC - PubMed
    1. Boni MF, et al. 2020. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol. 5(11):1408–1417. - PubMed
    1. Boni MF, Posada D, Feldman MW.. 2007. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics 176(2):1035–1047. - PMC - PubMed

Publication types