Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan;30(1):3-20.
doi: 10.1089/cmb.2021.0507. Epub 2022 Sep 20.

virDTL: Viral Recombination Analysis Through Phylogenetic Reconciliation and Its Application to Sarbecoviruses and SARS-CoV-2

Affiliations

virDTL: Viral Recombination Analysis Through Phylogenetic Reconciliation and Its Application to Sarbecoviruses and SARS-CoV-2

Sumaira Zaman et al. J Comput Biol. 2023 Jan.

Abstract

An accurate understanding of the evolutionary history of rapidly-evolving viruses like SARS-CoV-2, responsible for the COVID-19 pandemic, is crucial to tracking and preventing the spread of emerging pathogens. However, viruses undergo frequent recombination, which makes it difficult to trace their evolutionary history using traditional phylogenetic methods. In this study, we present a phylogenetic workflow, virDTL, for analyzing viral evolution in the presence of recombination. Our approach leverages reconciliation methods developed for inferring horizontal gene transfer in prokaryotes and, compared to existing tools, is uniquely able to identify ancestral recombinations while accounting for several sources of inference uncertainty, including in the construction of a strain tree, estimation and rooting of gene family trees, and reconciliation itself. We apply this workflow to the Sarbecovirus subgenus and demonstrate how a principled analysis of predicted recombination gives insight into the evolution of SARS-CoV-2. In addition to providing confirming evidence for the horseshoe bat as its zoonotic origin, we identify several ancestral recombination events that merit further study.

Keywords: SARS-CoV-2; Sarbecovirus evolution; phylogenetic reconciliation; viral recombination.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no conflicting financial interests.

Figures

FIG. 1.
FIG. 1.
virDTL enables inference of ancestral recombination. The figure shows a cartoon example of the virDTL pipeline applied to a toy dataset containing viruses from three civet cats, two pangolins, two bats, and one human. (a) Commonly used tools such as Simplot and RDP are well-suited to inferring recent recombinations between strains of interest, where the recombination signal is clear in the sequence similarity profile. (b) However, in cases where recombination has occurred between ancestral strains, and multiple recombinations have occurred in a single lineage, it becomes significantly more difficult to disentangle the sequence similarity signal to infer all recombinations. (c) Our model-based computational protocol, virDTL, takes into account the entire evolutionary history of a gene family, including several sources of inference uncertainty. A credible strain tree is estimated using nonrecombinant regions of the genome, and multiple gene tree candidates are inferred and error-corrected and reconciled against the strain tree to infer HGTs. In addition to accounting for gene tree topological and rooting uncertainty, we reconcile the same gene tree and species tree multiple times to capture the full landscape of uncertainty in inferring recombination.
FIG. 2.
FIG. 2.
Overview of Sarbecovirus genome evolution. (a) We reconstructed three candidate strain trees from the whole genome and two putative nonrecombinant regions A (13,000–18,000 base pairs) and B (4000–9000 base pairs). Their topologies differ substantially, especially in the SARS-CoV-2 lineage, which suggests that the evolution of SARS-CoV-2 was impacted by recombination. We define four clades, Zhejiang (green), SC2-RaTG (orange), Pangolin (purple), and HKU (blue), and show the tree inferred using each region of the genome. (b) The Sarbecovirus genome comprises four well-characterized structural genes which construct the viral spike, envelope, membrane, and nucleocapsid proteins, as well as several open reading frames which encode accessory factors. The spike and nucleocapsid genes are highlighted in red and pink, respectively, as they appear in several ancestral recombinations (Fig. 5). (c) Sequence similarity along the genome using SimPlot. Using Zhejiang clade sequences as query, we compare with the SC2-RaTG and HKU clades. For the majority of the genome, SC2-RaTG is more similar to Zhejiang. Between 11,857 and 20,677 base pairs, HKU is more similar. (d) We find evidence of an HGT from the immediate ancestor of the Zhejiang clade to an ancestor of the HKU clade in ORF1ab. This recombination (light gray) explains the signal shown in the NRR-A tree (a) and SimPlot (c) and is not consistent with the dating of the phylogeny (Supplementary Fig. S2). However, it is not uncommon for inferred HGTs to be off by a single branch due to inference uncertainty. A time-consistent HGT to the ancestor of the three HKU strains (darker gray) similarly explains the signal.
FIG. 3.
FIG. 3.
HGTs involving the SARS-CoV-2 lineage. (a) We inferred 8 highly supported (with a support of at least 500) HGTs which involve an ancestor of SARS-CoV-2 [Wuhan-Hu-1]. Support values are shown for the OptRoot-rooted gene trees (solid lines) or MAD-rooted gene trees (dashed lines), with one transfer (spike) inferred using both rootings. Smaller arrow heads indicate that there exists an HGT with at least 100 support in the reverse direction using gene trees rooted with either method, suggesting directional uncertainty. (b) We found a strong correlation between the number of leaves in a clade and the number of HGTs identified in that clade (Pearson's R2 = 0.99). However, for every ancestral strain in the SARS-CoV-2 lineage and related clades (highlighted by larger colored points), the number of HGTs in that clade is much lower than would be expected for their size. This paucity of HGTs is likely due to sampling effects, as these strains are more distantly related to the rest of the Sarbecovirus strains in the analysis. MAD, Minimum Ancestor Deviation.
FIG. 4.
FIG. 4.
Highly supported leaf-to-leaf HGTs are consistent with sequence similarity. We present a case study of a leaf-to-leaf HGT between the donor Rp3 (orange) and recipient Rm1 (green). (a) The inferred HGT from Rp3 to Rm1 in the spike gene has a support of 1000, shown on a subtree of the full species tree. (b) Sequence similarity of the donor Rp3 to its sibling GX2013 (purple) and the recipient Rm1. Rp3 and GX2013 are highly similar throughout the length of the genome, and Rm1 is more divergent throughout but equally similar in the spike region. (c) Sequence similarity of the recipient Rm1 to its sibling HuB2013 (blue) and the donor Rp3. Rm1 and HuB2013 are highly similar throughout the length of the genome except in the spike region, where Rm1 has received genetic material from Rp3. Thus, Rp3 and Rm1 are more similar in the spike region.
FIG. 5.
FIG. 5.
Ancestral HGTs are consistent with sequence similarity but difficult to discover from direct sequence comparison alone. We present a case study of an ancestral recombination which is highly supported in both the spike and nucleocapsid gene families, from the immediate ancestor of the SC2-RaTG clade (orange) to the immediate ancestor of the Pangolin clade (purple). (a) For much of the genome, Zhejiang is more similar to the donor SC2-RaTG than the recipient Pangolin. (b) Our analysis infers HGTs from SC2-RaTG to Pangolin in the spike and nucleocapsid genes with supports of 640 and 813, respectively. (c) In the 3′ region of the genome, Pangolin is often more similar to SC2-RaTG, especially in the spike and parts of the nucleocapsid gene families. However, it is difficult to clearly determine from direct sequence comparison alone which gene families have been affected by recombination, especially in ancestral cases such as these where the closest reference relative is the same for both donor and recipient. For this analysis, sequences for ancestral strains were estimated through a majority consensus of their descendants.
FIG. 6.
FIG. 6.
Pairs of strains with bidirectional HGTs suggest the presence of a third party donor. We present a case study of an inferred HGT between Anlong-103 (yellow) and YN2013 (blue) in the spike gene family, with support of (a) 503 in the forward direction and (b) 497 in the backward direction. Such bidirectional support suggests strong evidence that an HGT occurred and a third party was involved, but ambiguity as to the direction of the HGT. We found support for HGTs (a) from F46 to Anlong-103 (407 support) and (b) from F46 to YN2013 (393 support). (c, d) Show SimPlot analysis demonstrating that both Anlong-103 and YN2013 are significantly different from their respective siblings (RS7327, green, and Rs4084, magenta) in the spike gene. (e) SimPlot analysis shows that both Anlong-103 and YN2013 are equally similar to a putative third-party donor F46 (black).

References

    1. Andersen, K.G., Rambaut, A., Lipkin, W.I., et al. . 2020. The proximal origin of SARS-CoV-2. Nat. Med. 26, 450–452. - PMC - PubMed
    1. Bansal, M.S., Alm, E.J., and Kellis, M.. 2012. Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28, 283–291. - PMC - PubMed
    1. Bansal, M.S., Alm, E.J., and Kellis, M.. 2013. Reconciliation revisited: Handling multiple optima when reconciling with duplication, transfer, and loss. J. Comput. Biol. 20, 738–754. - PMC - PubMed
    1. Bansal, M.S., Kellis, M., Kordi, M., et al. . 2018. RANGER-DTL 2.0: Rigorous reconstruction of gene-family evolution by duplication, transfer and loss. Bioinformatics 34, 3214–3216. - PMC - PubMed
    1. Bansal, M.S., Wu, Y., Alm, E.J., et al. . 2015. Improved gene tree error correction in the presence of horizontal gene transfer. Bioinformatics 31, 1211–1218. - PMC - PubMed

Publication types