Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 8;14(12):evac161.
doi: 10.1093/gbe/evac161.

Extensive Recombination-driven Coronavirus Diversification Expands the Pool of Potential Pandemic Pathogens

Affiliations

Extensive Recombination-driven Coronavirus Diversification Expands the Pool of Potential Pandemic Pathogens

Stephen A Goldstein et al. Genome Biol Evol. .

Abstract

The ongoing SARS-CoV-2 pandemic is the third zoonotic coronavirus identified in the last 20 years. Enzootic and epizootic coronaviruses of diverse lineages also pose a significant threat to livestock, as most recently observed for virulent strains of porcine epidemic diarrhea virus (PEDV) and swine acute diarrhea-associated coronavirus (SADS-CoV). Unique to RNA viruses, coronaviruses encode a proofreading exonuclease (ExoN) that lowers point mutation rates to increase the viability of large RNA virus genomes, which comes with the cost of limiting virus adaptation via point mutation. This limitation can be overcome by high rates of recombination that facilitate rapid increases in genetic diversification. To compare the dynamics of recombination between related sequences, we developed an open-source computational workflow (IDPlot) that bundles nucleotide identity, recombination, and phylogenetic analysis into a single pipeline. We analyzed recombination dynamics among three groups of coronaviruses with noteworthy impacts on human health and agriculture: SARSr-CoV, Betacoronavirus-1, and SADSr-CoV. We found that all three groups undergo recombination with highly diverged viruses from undersampled or unsampled lineages, including in typically highly conserved regions of the genome. In several cases, no parental origin of recombinant regions could be found in genetic databases, demonstrating our shallow characterization of coronavirus diversity and expanding the genetic pool that may contribute to future zoonotic events. Our results also illustrate the limitations of current sampling approaches for anticipating zoonotic threats to human and animal health.

Keywords: coronaviruses; evolution; virology.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
AlphaCoV and BetaCoV phylogenetic relationships are genome region-dependent. (A) Basic coronavirus genome organization with the 5′ ∼20 kb comprising the replicase gene that is proteolytically processed into up to 16 individual proteins. The 3′ 10 kb comprises structural and genus-specific accessory genes. (B) Maximum-likelihood (ML) phylogenetic tree of alpha and betaCoVs full-length RNA-dependent RNA-polymerase encoding region of Orf1ab. (C) ML phylogenetic tree of full-length spike genes from viruses in the species Betacoronavirus 1 (red) rooted with the distantly related betacoronavirus mouse hepatitis virus. (D) ML phylogenetic tree of spike genes of SARSr-CoVs, with SARS-CoV-2-like viruses further analyzed in the paper highlighted in blue. (E) ML phylogenetic tree of spike genes from SADSr-CoVs (magenta) rooted with the distantly related alphacoronavirus RnCov/Lucheng-19.
Fig. 2.
Fig. 2.
IDPlot workflow. (A) Reference and query sequences are aligned using MAFFT. (B) Breakpoint detection is performed using GARD, capturing breakpoints across iterative refinements. (C) Phylogenetic trees based on breakpoints from each iteration and are created using FastTree 2. (D) Improvement in ΔAIC-c is plotted against the iteration. (E) Phylogenetic trees associated with the selected GARD iteration are displayed.
Fig. 3.
Fig. 3.
SARSr-CoV IDPlot analysis. (A) IDPlot analysis of SARS-CoV-2-like SARSr-CoVs with color-coded dashed lines defining divergent regions arising from recombination events with ancestral viruses. (B) ML tree of the RdRp-encoding region of SARS-2-like and other SARSr-CoVs showing close relationship between the SARS-CoV-2-like viruses. (C) ML tree of PangolinCoV/GD19 RR1 (which overlaps with BtCoV/RmYN02 RR1) showing different topology than the RdRp tree. (D) Schematic of spike proteins indicating divergent regions and nucleotide identity to the reference sequence and closest related sequence in GenBank. (E) ML tree of ORf8 showing that RmYN02 Orf8 is a divergent member of the SARS-CoV-like Orf8 branch.
Fig. 4.
Fig. 4.
Recombination analysis of Betacoronavirus-1. (A) Nucleotide identity plot and multiple sequence alignment of BetaCoV-1 viruses. Orange dashed lines indicate divergent regions of the ECoV-NC99 genome while black dashed lines are regions with high identity to the reference sequence bovine coronavirus (BCoV). (B) ML tree of nsp2-encoding region of Orf1ab, which falls within the divergent ECoV-NC99 Region 2. (C) ML tree of the RdRp-encoding region of Orf1ab. (D) Schematic depicting the spike gene diversity of BetaCoV1 demonstrating the divergence of ECoV-NC99 and PHEV. Top BLAST hits in bolded red indicate no GenBank entries with >80% nucleotide identity.
Fig. 5.
Fig. 5.
SADSr-CoV IDPlot analysis. (A) IDPlot nucleotide identity and multiple sequence alignment of eight SADSr-CoVs. Color-coded dashed lines indicate divergent regions in corresponding viruses owing to recombination events. (B) Schematic of spike genes of SADSr-CoVs along with nucleotide identity to the reference sequence and closest related sequences in GenBank for S1 and S2 domains. (C) Schematic of Orf7a diversity with nucleotide identity to the reference sequence and closest related sequences in GenBank. (D) Phylogenetic tree of SADSr-CoVs based on 3ClPro sequence illustrating the history of inferred recombination events indicated by arrowheads.

Update of

References

    1. Boni MF, et al. . 2020. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol. 5:1408–1417. - PubMed
    1. Corman VM, et al. . 2014. Rooting the phylogenetic tree of Middle East respiratory syndrome coronavirus by characterization of a conspecific virus from an African Bat. J Virol. 88:11297–11303. - PMC - PubMed
    1. Corman VM, et al. . 2016. Link of a ubiquitous human coronavirus to dromedary camels. Proc Natl Acad Sci 113:9864–9869. - PMC - PubMed
    1. Crossley B, Mock R, Callison S, Hietala S. 2012. Identification and characterization of a novel alpaca respiratory coronavirus most closely related to the human coronavirus 229E. Viruses 4:3689–3700. - PMC - PubMed
    1. Debat HJ. 2018. Expanding the size limit of RNA viruses: evidence of a novel divergent nidovirus in California sea hare, with a ∼35.9 kb virus genome. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory.