Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2021 Mar 25:2021.03.25.437046.
doi: 10.1101/2021.03.25.437046.

Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring

Affiliations

Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring

Wolfgang Maier et al. bioRxiv. .

Update in

  • Ready-to-use public infrastructure for global SARS-CoV-2 monitoring.
    Maier W, Bray S, van den Beek M, Bouvier D, Coraor N, Miladi M, Singh B, De Argila JR, Baker D, Roach N, Gladman S, Coppens F, Martin DP, Lonie A, Grüning B, Kosakovsky Pond SL, Nekrutenko A. Maier W, et al. Nat Biotechnol. 2021 Oct;39(10):1178-1179. doi: 10.1038/s41587-021-01069-1. Nat Biotechnol. 2021. PMID: 34588690 Free PMC article. No abstract available.

Abstract

The COVID-19 pandemic is the first global health crisis to occur in the age of big genomic data.Although data generation capacity is well established and sufficiently standardized, analytical capacity is not. To establish analytical capacity it is necessary to pull together global computational resources and deliver the best open source tools and analysis workflows within a ready to use, universally accessible resource. Such a resource should not be controlled by a single research group, institution, or country. Instead it should be maintained by a community of users and developers who ensure that the system remains operational and populated with current tools. A community is also essential for facilitating the types of discourse needed to establish best analytical practices. Bringing together public computational research infrastructure from the USA, Europe, and Australia, we developed a distributed data analysis platform that accomplishes these goals. It is immediately accessible to anyone in the world and is designed for the analysis of rapidly growing collections of deep sequencing datasets. We demonstrate its utility by detecting allelic variants in high-quality existing SARS-CoV-2 sequencing datasets and by continuous reanalysis of COG-UK data. All workflows, data, and documentation is available at https://covid19.galaxyproject.org .

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Analysis flow in our analysis system. VCF = variant call format, TSV = tab separated values, JSON = JavaScript Object Notation.
Figure 2.
Figure 2.
Number of SRA accession for each sequencing technology and library preparation strategy.
Figure 3.
Figure 3.
Dot plot of all allelic-variants (AV) across samples in Boston dataset. X-axis: genome position, Y-axis: Samples, colors correspond to functional classes of AVs. Samples are arranged by hierarchical clustering using cosine distances on mean allele frequencies of all AVs. A. Dot-plot of all allelic variants in the “Boston” dataset; rows – samples, columns – genomic coordinates; samples are arranged by hierarchical clustering. Limited to variants that occur in at least 4 samples. B. Dot-plot of observed variants in the “Boston” dataset; restricted to variants that appear only at AF≤10% and occur in at least 4 samples each. Variants are partitioned into 10 clusters, using K-medoids using the Hamming distance on AF vectors; the cluster with 8 variants is highlighted in orange. Interactive version is at https://observablehq.com/@spond/intrahost-variant-exploration-landing
Figure 4.
Figure 4.
Intersection between allelic-variants (AV) reported here with AVs of concern (VOC). Big blob in “COG-Post” dataset corresponds to L18F change in gene S. Size of markers ∝ fraction of samples containing variant. [min;max] - maximum and minimum counts of samples containing variants shown in this figure. E.g., in “Boston” the largest marker corresponds to an AV shared by 7 samples, and the smallest by 3 samples. “Boston” (https://covid19.galaxyproject.org/genomics/interactive_images/voc_Boston.html)

References

    1. Baker D. et al. No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathog. 16, e1008643 (2020). - PMC - PubMed
    1. Quick J. et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat. Protoc. 12, 1261–1276 (2017). - PMC - PubMed
    1. Grubaugh N. D. et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 20, 8 (2019). - PMC - PubMed
    1. XSEDE. www.xsede.org.
    1. ELIXIR-DE. https://www.denbi.de/elixir-de.

Publication types

LinkOut - more resources