Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 16;8(1):veac048.
doi: 10.1093/ve/veac048. eCollection 2022.

Identifying SARS-CoV-2 regional introductions and transmission clusters in real time

Affiliations

Identifying SARS-CoV-2 regional introductions and transmission clusters in real time

Jakob McBroome et al. Virus Evol. .

Abstract

The unprecedented severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) global sequencing effort has suffered from an analytical bottleneck. Many existing methods for phylogenetic analysis are designed for sparse, static datasets and are too computationally expensive to apply to densely sampled, rapidly expanding datasets when results are needed immediately to inform public health action. For example, public health is often concerned with identifying clusters of closely related samples, but the sheer scale of the data prevents manual inspection and the current computational models are often too expensive in time and resources. Even when results are available, intuitive data exploration tools are of critical importance to effective public health interpretation and action. To help address this need, we present a phylogenetic heuristic that quickly and efficiently identifies newly introduced strains in a region, resulting in clusters of infected individuals, and their putative geographic origins. We show that this approach performs well on simulated data and yields results largely congruent with more sophisticated Bayesian phylogeographic modeling approaches. We also introduce Cluster-Tracker (https://clustertracker.gi.ucsc.edu/), a novel interactive web-based tool to facilitate effective and intuitive SARS-CoV-2 geographic data exploration and visualization across the USA. Cluster-Tracker is updated daily and automatically identifies and highlights groups of closely related SARS-CoV-2 infections resulting from the transmission of the virus between two geographic areas by travelers, streamlining public health tracking of local viral diversity and emerging infection clusters. The site is open-source and designed to be easily configured to analyze any chosen region, making it a useful resource globally. The combination of these open-source tools will empower detailed investigations of the geographic origins and spread of SARS-CoV-2 and other densely sampled pathogens.

Keywords: COVID-19; Cluster-Tracker; SARS-CoV-2; genomic epidemiology; phylodynamics; phylogenetic methods; phylogeography.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Example index calculation. This small example tree demonstrates a computation of our index. The focal node at the base has an index value below 0.5, suggesting that it is out-of-region by our heuristic. Our introduction point is therefore along the long branch below the root, and the ancestor of the downstream in-region sample cluster would have existed along that branch.
Figure 2.
Figure 2.
Global distribution of SARS-CoV-2 transmission clusters. (A) The log count of clusters detected across each of the 102 countries surveyed. The number of clusters detected is largely a function of total local sequencing effort. (B) The five countries with the highest representation in the data. The USA and England together constitute more than half of all available sequences. (C) Cluster sizes are consistent across countries. Most clusters are small, implying most newly introduced SARS-CoV-2 lineages quickly die out.
Figure 3.
Figure 3.
International and interstate introductions across the USA. (A) The log count of clusters identified across the continental USA. California, Texas, Florida, and New York are associated with the greatest number of unique clusters. (B) The proportion of international introductions in each state plotted against the total samples collected in that state. This relationship is largely linear, reflecting the correlation between sampling, population size, and levels of international travel. PR (Puerto Rico) exhibits relatively more international introductions for its sampling than other territories and states of the USA. (C) The distribution of cluster sizes across states. These are largely consistent with clusters identified at the international level.
Figure 4.
Figure 4.
Log-fold interstate transmission for the states of California (A) and Illinois (B). (A) Interstate introductions of COVID-19 into California are relatively more likely to originate on the West Coast, particularly from Nevada. (B) Interstate introductions of COVID-19 into Illinois are relatively more likely to come from the immediate surroundings, particularly Iowa and Missouri.
Figure 5.
Figure 5.
The Cluster-Tracker site. The Cluster-Tracker tool is updated daily at https://clustertracker.gi.ucsc.edu. Users can interactively explore the latest results of our heuristic applied to each of the continental USA, by sorting the interactive table, selecting states to focus on in the map, and using the Taxonium tree-viewing platform to examine clusters of interest in detail.
Figure 6.
Figure 6.
Example clusters in the Taxonium phylogenetic tree viewer. (A) An example cluster in Texas (member samples circled) that is inferred to have originated from California (regional index = 0.94). There are many samples from California closely related to the cluster’s common ancestor, supporting California as the most likely origin. (B) A different, much larger, 9,533 leaf cluster in California. This represents a lineage of SARS-CoV-2 commonly circulating in California, descended from one of the original introductions of the Delta variant into California in mid-June 2021. Descendents from this cluster have transmitted to other regions many times, but members of this cluster have been found in California as recently as 7 December 2021.

Similar articles

Cited by

References

    1. Alpert T. et al. (2021) ‘Early Introductions and Transmission of SARS-CoV-2 Variant B.1.1.7 In the United States’, Cell, 184: 2595–604.e13. - PMC - PubMed
    1. Bello X. et al. (2022) ‘CovidPhy: A Tool for Phylogeographic Analysis of SARS-CoV-2 Variation’, Environmental Research, 204: 111909. - PMC - PubMed
    1. Brito A. F. et al. (2021) ‘Global Disparities in SARS-CoV-2 Genomic Surveillance’, medRxiv. 2021.08.21.21262393.doi: 10.1101/2021.08.21.21262393. - DOI
    1. Colson P., and Raoult D. (2021) ‘Global Discrepancies Between Numbers of Available SARS-CoV-2 Genomes and Human Development Indexes at Country Scales’, Viruses, 13: 775. - PMC - PubMed
    1. COVID-19 Genomics UK (COG-UK) consortiumcontact@cogconsortium.uk . (2020) ‘An Integrated National Scale SARS-CoV-2 Genomic Surveillance Network’, The Lancet Microbe, 1: e99–100. - PMC - PubMed