Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;10(10):001293.
doi: 10.1099/mgen.0.001293.

The Canadian VirusSeq Data Portal and Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology

Erin E Gill  1 Baofeng Jia  1 Carmen Lia Murall  2   3 Raphaël Poujol  4 Muhammad Zohaib Anwar  5 Nithu Sara John  5 Justin Richardsson  6 Ashley Hobb  7 Abayomi S Olabode  8 Alexandru Lepsa  6 Ana T Duggan  3 Andrea D Tyler  3 Arnaud N'Guessan  9   10 Atul Kachru  6 Brandon Chan  6 Catherine Yoshida  3 Christina K Yung  11   6 David Bujold  12   13 Dusan Andric  6 Edmund Su  6 Emma J Griffiths  5 Gary Van Domselaar  3 Gordon W Jolly  3 Heather K E Ward  7 Henrich Feher  6 Jared Baker  6 Jared T Simpson  6 Jaser Uddin  6 Jiannis Ragoussis  9 Jon Eubank  6 Jörg H Fritz  14 José Héctor Gálvez  6 Karen Fang  7 Kim Cullion  6 Leonardo Rivera  6 Linda Xiang  6 Matthew A Croxen  15   16   17   18 Mitchell Shiell  6 Natalie Prystajecky  19   20 Pierre-Olivier Quirion  13 Rosita Bajari  6 Samantha Rich  6 Samira Mubareka  21 Sandrine Moreira  22 Scott Cain  6 Steven G Sutcliffe  23 Susanne A Kraemer  9   24 Yelizar Alturmessov  6 Yann Joly  25 Cphln ConsortiumCanCOGeN ConsortiumVirusSeq Data Portal Academic And Health NetworkMarc Fiume  7 Terrance P Snutch  26 Cindy Bell  27 Catalina Lopez-Correa  27 Julie G Hussin  4   28   10 Jeffrey B Joy  29   30   31 Caroline Colijn  32 Paul M K Gordon  33 William W L Hsiao  5 Art F Y Poon  8 Natalie C Knox  3 Mélanie Courtot  34   6 Lincoln Stein  6 Sarah P Otto  35 Guillaume Bourque  12   13 B Jesse Shapiro  23 Fiona S L Brinkman  1
Affiliations

The Canadian VirusSeq Data Portal and Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology

Erin E Gill et al. Microb Genom. 2024 Oct.

Abstract

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform the public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This portal has been coupled with other resources, such as Viral AI, and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this portal (https://virusseq-dataportal.ca/), including its contextual data not available elsewhere, and the Duotang (https://covarr-net.github.io/duotang/duotang.html), a web platform that presents key genomic epidemiology and modelling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the portal (COVID-MVP, CoVizu), are all open source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.

Keywords: data sharing; evolutionary biology; mutational analysis; open access; viral genomics.

PubMed Disclaimer

Conflict of interest statement

J.T.S. receives research funding from Oxford Nanopore Technologies (ONT) and has received travel support to attend and speak at meetings organized by ONT, and is on the Scientific Advisory Board of Day Zero Diagnostics.

Figures

Fig. 1.
Fig. 1.. Data flow overview. SARS-CoV-2 viral samples and contextual data are collected by Public Health, hospitals and labs. Some samples are selected for sequencing based on national and regional priorities. After sequencing, regional Public Health authorities select samples that will be shared publicly. Sequence and contextual data from these samples are uploaded to the data portal using the DataHarmonizer and FASTA uploader. From the portal, the public can view or download contextual and sequence data from across the country. Data can also be accessed for value-added resources, such as Duotang, ViralAI, COVID-MVP, and CoVizu, via the API. Green check marks on the figure indicate points in the data flow where human quality assurance/quality control (QA/QC) and/or ethical oversight take place.
Fig. 2.
Fig. 2.. Overview of the data flow from VirusSeq Data Portal to Duotang. Genomics and epidemiological data from VirusSeq are first processed by DNAStack’s Viral AI workflow. The result of this is a dataset containing SARS-CoV-2 lineage information. Duotang then retrieves this lineage information using Viral AI’s Global Alliance for Genomics and Health (GA4GH) compliant APIs for further analyses.
Fig. 3.
Fig. 3.. Duotang webpage overview. Duotang contains many different interactive plots for the user to explore SARS-CoV-2 genomic epidemiology in Canada. All sections of the page are easily accessible via the menu bar, which is located on the left-hand side of the page. The first section of the page gives a text description of current variants of interest in the country. Several of the plots are highlighted in the figure above, which include different visualizations of selection on variants, phylogenetic trees and root-to-tip analyses of these trees to detect unusual genetic changes. In addition to what is shown here, the user can also view plots of the growth advantage of single lineages relative to a reference lineage (on both national and regional levels), visualize the mutational composition of actively circulating lineages via an embedded frame of COVID-MVP [19], review the changing proportions of different variants over time (back to April 2020, on both national and regional levels), examine molecular clock estimates for different VOIs, and utilize a searchable table to view the ancestors and description of any Pango lineage. For more details, please visit https://covarr-net.github.io/duotang/duotang.html.
Fig. 4.
Fig. 4.. Overview of the VirusSeq Data Portal Explore page. The VirusSeq Data Portal allows users to browse samples stored via a web interface or API. Within the interface, the user is able to see the samples that are available and their metadata, and perform filters and queries to identify samples of interest that can then be downloaded for analysis.

Update of

  • The Canadian VirusSeq Data Portal & Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology.
    Gill EE, Jia B, Murall CL, Poujol R, Anwar MZ, John NS, Richardsson J, Hobb A, Olabode AS, Lepsa A, Duggan AT, Tyler AD, N'Guessan A, Kachru A, Chan B, Yoshida C, Yung CK, Bujold D, Andric D, Su E, Griffiths EJ, Van Domselaar G, Jolly GW, Ward HKE, Feher H, Baker J, Simpson JT, Uddin J, Ragoussis J, Eubank J, Fritz JH, Gálvez JH, Fang K, Cullion K, Rivera L, Xiang L, Croxen MA, Shiell M, Prystajecky N, Quirion PO, Bajari R, Rich S, Mubareka S, Moreira S, Cain S, Sutcliffe SG, Kraemer SA, Joly Y, Alturmessov Y, Consortium C, Consortium C; VirusSeq Data Portal Academic and Health network; Fiume M, Snutch TP, Bell C, Lopez-Correa C, Hussin JG, Joy JB, Colijn C, Gordon PMK, Hsiao WWL, Poon AFY, Knox NC, Courtot M, Stein L, Otto SP, Bourque G, Shapiro BJ, Brinkman FSL. Gill EE, et al. ArXiv [Preprint]. 2024 May 8:arXiv:2405.04734v1. ArXiv. 2024. Update in: Microb Genom. 2024 Oct;10(10). doi: 10.1099/mgen.0.001293. PMID: 38764594 Free PMC article. Updated. Preprint.

References

    1. Cameron R, Savić-Kallesøe S, Griffiths EJ, Dooley D, Sridhar A, et al. SARS-CoV-2 genomic contextual data harmonization: recommendations from a mixed methods analysis of COVID-19 case report forms across Canada. 2022. [9-April-2024]. https://www.researchsquare.com/article/rs-1871614/v1 accessed.
    1. Shu Y, McCauley J. GISAID: global initiative on sharing all influenza data – from vision to reality. Eurosurveillance. 2017;22:30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494. - DOI - PMC - PubMed
    1. Khare S, Gurry C, Freitas L, Schultz MB, Bach G, et al. GISAID’s role in pandemic response. China CDC Wkly. 2021;3:1049–1051. doi: 10.46234/ccdcw2021.255. - DOI - PMC - PubMed
    1. Elbe S, Buckland‐Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges . 2017;1:33–46. doi: 10.1002/gch2.1018. - DOI - PMC - PubMed
    1. Kalia K, Saberwal G, Sharma G. The lag in SARS-CoV-2 genome submissions to GISAID. Nat Biotechnol. 2021;39:1058–1060. doi: 10.1038/s41587-021-01040-0. - DOI - PubMed

LinkOut - more resources