Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2020 Sep 21:2020.09.21.300913.
doi: 10.1101/2020.09.21.300913.

Rapid detection of inter-clade recombination in SARS-CoV-2 with Bolotie

Affiliations

Rapid detection of inter-clade recombination in SARS-CoV-2 with Bolotie

Ales Varabyou et al. bioRxiv. .

Update in

Abstract

The ability to detect recombination in pathogen genomes is crucial to the accuracy of phylogenetic analysis and consequently to forecasting the spread of infectious diseases and to developing therapeutics and public health policies. However, previous methods for detecting recombination and reassortment events cannot handle the computational requirements of analyzing tens of thousands of genomes, a scenario that has now emerged in the effort to track the spread of the SARS-CoV-2 virus. Furthermore, the low divergence of near-identical genomes sequenced in short periods of time presents a statistical challenge not addressed by available methods. In this work we present Bolotie, an efficient method designed to detect recombination and reassortment events between clades of viral genomes. We applied our method to a large collection of SARS-CoV-2 genomes and discovered hundreds of isolates that are likely of a recombinant origin. In cases where raw sequencing data was available, we were able to rule out the possibility that these samples represented co-infections by analyzing the underlying sequence reads. Our findings further show that several recombinants appear to have persisted in the population.

PubMed Disclaimer

Conflict of interest statement

Disclosure Declaration

The authors have no conflicts of interest to declare.

Figures

Figure 1.
Figure 1.
An unrooted topological cladogram of 4,249 SARS-CoV-2 genomes including 225 recombinants labeled as red bars. Arcs link each recombinant to both inferred parental genomes. The color of the arc corresponds to the color of the clade to which a recombinant was clustered within the tree. Clades correspond to the GISAID clades GR (0), GH (1), G (2) and all minor lineages combined (4).
Figure 2.
Figure 2.
Four examples of inferred recombinant sequences: A. EPI_ISL_439137; B. EPI_ISL_468407; C. EPI_ISL_509874; D. EPI_ISL_417420. The top section of each plot shows conditional probabilities of a clade given a nucleotide at each position. Bars are plotted for the two parent clades and the other clades are shown as dots of the corresponding color. Each peak >0.1 above the baseline (0.25) is labeled with the number of genomes it appears in. An average is reported whenever there are multiple variants in close proximity on the plot, listing the number of averaged variants in parentheses. The three lower panels of each plot show the frequency of variants at each position for parental clades (top and bottom rows) and variants observed on the recombinant genome (middle row).
Figure 3.
Figure 3.
Effects of sequence composition on the topology of the phylogenetic tress for SARS-CoV-2. A tree obtained directly from NextStrain (A) is first compared to (B) the tree computed using Bolotie consensus sequences for the same set of isolates. (C) Shows a tree computed for the same set of isolates with 210 additional recombinant sequences as identified by Bolotie. Leaf nodes that correspond to recombinant genomes are labeled with red dots.
Figure 4.
Figure 4.
The maximum conditional probability for each nucleotide is highlighted in gray, while the path with the maximum likelihood is highlighted in bold. By penalizing switching of clades, insignificant differences in probabilities between clades as well as short windows representing a switch to a different clade are avoided. For clarity transitions between nodes on non-optimal paths are indicated in gray without labeled probabilities.

References

    1. Awadalla P. (2003). The evolutionary genomics of pathogen recombination. Nature Reviews Genetics, 4(1), 50–60. - PubMed
    1. Bruen T. C., Philippe H., & Bryant D. (2006). A simple and robust statistical test for detecting the presence of recombination. Genetics, 172(4), 2665–2681. - PMC - PubMed
    1. Demir A. B., Benvenuto D., ABACIOGLU Y. H., Angeletti S., & Ciccozzi M. (2020). Identification of the nucleotide substitutions in 62 SARS-CoV-2 sequences from Turkey. Turkish Journal of Biology, 44(SI-1), 178–184. - PMC - PubMed
    1. Dong E., Du H., & Gardner L. (2020). An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases, 20(5), 533–534. - PMC - PubMed
    1. Drake J. W., & Holland J. J. (1999). Mutation rates among RNA viruses. Proceedings of the National Academy of Sciences, 96(24), 13910–13913. - PMC - PubMed

Publication types