Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr;20(4):512-522.
doi: 10.1038/s41592-023-01769-3. Epub 2023 Feb 23.

Outbreak.info genomic reports: scalable and dynamic surveillance of SARS-CoV-2 variants and mutations

Affiliations

Outbreak.info genomic reports: scalable and dynamic surveillance of SARS-CoV-2 variants and mutations

Karthik Gangavarapu et al. Nat Methods. 2023 Apr.

Abstract

In response to the emergence of SARS-CoV-2 variants of concern, the global scientific community, through unprecedented effort, has sequenced and shared over 11 million genomes through GISAID, as of May 2022. This extraordinarily high sampling rate provides a unique opportunity to track the evolution of the virus in near real-time. Here, we present outbreak.info , a platform that currently tracks over 40 million combinations of Pango lineages and individual mutations, across over 7,000 locations, to provide insights for researchers, public health officials and the general public. We describe the interpretable visualizations available in our web application, the pipelines that enable the scalable ingestion of heterogeneous sources of SARS-CoV-2 variant data and the server infrastructure that enables widespread data dissemination via a high-performance API that can be accessed using an R package. We show how outbreak.info can be used for genomic surveillance and as a hypothesis-generation tool to understand the ongoing pandemic at varying geographic and temporal scales.

PubMed Disclaimer

Conflict of interest statement

Competing interests

M.A.S. receives grants from the US National Institutes of Health within the scope of this work and grants and contracts from the US Food and Drug Administration, the US Department of Veterans Affairs and Janssen Research & Development outside the scope of this work. M.A.S. and K.G.A. have received consulting fees and/or compensated expert testimony on SARS-CoV-2 and the COVID-19 pandemic. The other authors declare no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Empirical basis for selecting 75% as a threshold to identify ‘characteristic mutations’ of a lineage.
(a) The frequency of mutations above 5% prevalence in P.1 (Gamma), BA.1 (Omicron), B.1.617.2 (Delta), B.1.351 (Beta), and B.1.1.7 (Alpha) variants. (b) Mutations present in >= 75% of all sequences in P.1 (Gamma), BA.1 (Omicron), B.1.617.2 (Delta), B.1.351 (Beta), and B.1.1.7 (Alpha) variants.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Software infrastructure of outbreak.info.
The infrastructure can be broadly divided into (1) Data ingestion pipelines, (2) Server-side hosting the database and API server, and (3) Client-side applications that use the API from the server.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Examples of warnings to ensure users pay attention to possible biases while interpreting visualizations on the web interface.
(a) Link (‘Read about biases’) to the caveats page on the web interface in the summary box section of the lineage/mutation tracker. (b) Link (‘Estimates are biased by sampling (read more)’) to the caveats page above the streamgraph on the web interface of the location tracker. (c) Link (‘read more’) to the methods page on the web interface about how characteristic mutations are identified and associated limitations. (d) Warning about the limitations of identifying characteristic mutations when less than 1000 sequences are assigned to a lineage.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Flowchart describing the steps in Bjorn.
The genomic data and associated metadata from GISAID undergo preprocessing and filtering to exclude erroneous or incomplete records (depicted in blue). The preprocessed information is then compressed and compared to prior cached versions to determine new or updated records (depicted in black). These new genomes are aligned and mutations are counted, followed by lineage identification. The locations and dates in the new metadata are also normalized to enable standardized query access. These processing steps are executed in parallel by splitting the data into chunks of 10 Mb (depicted in purple). The processed data from the new records are combined with the processed data from the unaltered records (depicted in brown), following which they are stored in an Elasticsearch database.
Fig. 1 |
Fig. 1 |. outbreak.info enables the exploration of genomic data across three dimensions.
a, Growth rate of a lineage is a function of epidemiology and intrinsic biological properties of a lineage. Further, epidemiology varies over time and by geography, whereas intrinsic biological properties are determined by the mutations present in a given lineage. b, Genomic data are ingested from GISAID, processed using the custom-built data pipeline (Bjorn) and stored on a server that can be accessed via an API. The API is consumed by two clients: a JavaScript-based web client and an R package that provides programmatic access by authenticating against GISAID credentials. c, The web interface contains three tools that allow exploration of genomic data across three different dimensions: lineage/mutation, time and geography.
Fig. 2 |
Fig. 2 |. Lineage and/or Mutation Tracker.
a, Prevalence of VOCs in the United Kingdom from Sep 2020 to May 2022. The error bands show the 95% binomial proportion confidence interval calculated using Jeffrey’s interval. b, Search and filter options for Lineage/Variant of Concern tracker. c, Prevalence of S:Y145H+ S:A222V mutations across different lineages globally. d, Prevalence of BA.2 in the United Kingdom. The error bands show the 95% binomial proportion confidence interval calculated using Jeffrey’s interval. e, Mutation map showing the characteristic mutations of AY.4. f, Summary statistics of BA.2 lineage. g, Geographic distribution of the cumulative prevalence of BA.2 lineage over the last 60 d globally. h, Cumulative prevalence of BA.2 in each country over the last 60 d globally. i, Research articles and datasets related to BA.2.
Fig. 3 |
Fig. 3 |. Location report.
a, Relative prevalence of all lineages over time in South Africa. Total number of sequenced samples collected per day are shown in the bar chart below. b, Relative cumulative prevalence of all lineages over the last 60 d in South Africa. c, Mutation prevalence across the most prevalent lineages in South Africa over the last 60 d. d, Comparison of the prevalence of VOCs grouped by WHO classification: Alpha, Beta, Delta and Omicron over time in South Africa. The error bands show the 95% binomial proportion confidence interval calculated using Jeffrey’s interval. e, Daily reported cases in South Africa are shown in the line chart.
Fig. 4 |
Fig. 4 |. Prevalence of VOCs Alpha, Beta, Gamma, Delta and Omicron lineages over time.
ad, Prevalence worldwide (a), in South Africa (b), in Brazil (c) and in the United States (d). Error bands in ad show 95% binomial proportion confidence intervals calculated using Jeffrey’s interval. eh, Lineages with a prevalence over 3% over the last 60 d in Denmark (e), United Kingdom (f), United States (g) and South Africa (h).

Update of

Comment in

  • Tracking SARS-CoV-2 variants and resources.
    Oude Munnink BB, Koopmans M. Oude Munnink BB, et al. Nat Methods. 2023 Apr;20(4):489-490. doi: 10.1038/s41592-023-01833-y. Nat Methods. 2023. PMID: 36922622 No abstract available.

Similar articles

Cited by

References

    1. Zhu N et al. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. 382, 727–733 (2020). - PMC - PubMed
    1. Skowronski DM & De Serres G Safety and efficacy of the BNT162b2 mRNA Covid-19 vaccine. N. Engl. J. Med. 384, 1576–1577 (2021). - PubMed
    1. Holmes E Novel 2019 coronavirus genome. Virological https://virological.org/t/novel-2019-coronavirus-genome/319 (2020).
    1. Khare S et al. GISAID’s role in pandemic response. China CDC Wkly. 3, 1049–1051 (2021). - PMC - PubMed
    1. Rambaut A et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5, 1403–1407 (2020). - PMC - PubMed

Publication types

Supplementary concepts