Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 17;15(6):1391.
doi: 10.3390/v15061391.

CoVigator-A Knowledge Base for Navigating SARS-CoV-2 Genomic Variants

Affiliations

CoVigator-A Knowledge Base for Navigating SARS-CoV-2 Genomic Variants

Thomas Bukur et al. Viruses. .

Abstract

Background: The outbreak of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) resulted in the global COVID-19 pandemic. The urgency for an effective SARS-CoV-2 vaccine has led to the development of the first series of vaccines at unprecedented speed. The discovery of SARS-CoV-2 spike-glycoprotein mutants, however, and consequentially the potential to escape vaccine-induced protection and increased infectivity, demonstrates the persisting importance of monitoring SARS-CoV-2 mutations to enable early detection and tracking of genomic variants of concern.

Results: We developed the CoVigator tool with three components: (1) a knowledge base that collects new SARS-CoV-2 genomic data, processes it and stores its results; (2) a comprehensive variant calling pipeline; (3) an interactive dashboard highlighting the most relevant findings. The knowledge base routinely downloads and processes virus genome assemblies or raw sequencing data from the COVID-19 Data Portal (C19DP) and the European Nucleotide Archive (ENA), respectively. The results of variant calling are visualized through the dashboard in the form of tables and customizable graphs, making it a versatile tool for tracking SARS-CoV-2 variants. We put a special emphasis on the identification of intrahost mutations and make available to the community what is, to the best of our knowledge, the largest dataset on SARS-CoV-2 intrahost mutations. In the spirit of open data, all CoVigator results are available for download. The CoVigator dashboard is accessible via covigator.tron-mainz.de.

Conclusions: With increasing demand worldwide in genome surveillance for tracking the spread of SARS-CoV-2, CoVigator will be a valuable resource of an up-to-date list of mutations, which can be incorporated into global efforts.

Keywords: SARS-CoV-2; dashboard; genomic variants; intrahost; knowledge base; pipeline; software; virus genome assemblies.

PubMed Disclaimer

Conflict of interest statement

Author U.S. is co-founder, shareholder and CEO at BioNTech SE. The remaining authors declare no conflict of interest.

Figures

Figure 1
Figure 1
CoVigator system components. The accessor reads external data and stores it in an SQL database. The processor reads the stored data and distributes the processing of every sample in an HPC cluster via Dask. The pipeline processes FASTA and FASTQ data and finally stores the results back in the database (See Figure S1 for a more detailed FASTA and FASTQ processing pipeline). The dashboard reads the results and displays them in a set of interactive plots. The results are also available in raw format.
Figure 2
Figure 2
Samples by country tab plots for raw read dataset. (A) accumulation of samples through time by country; (B) dN/dS ratio through time on each SARS-CoV-2 protein; (C) dN/dS ratio through time in the domains of the spike protein. See Figure S2 for a screenshot including the filters.
Figure 3
Figure 3
Interactive plots in the lineages tab for the raw read dataset. (A) Accumulation of samples in each lineage through time; (B) dominant lineages through time. See Figure S3 for a screenshot including the filters.
Figure 4
Figure 4
Interactive plots on the mutation statistics tab showing results for raw reads and genomic assembly datasets. (A) ENA distribution of the number of mutations per sample; (B) C19DP distribution of the number of mutations per sample; (C) ENA frequency of base substitutions, (D) C19DP frequency of base substitutions; (E) ENA indel length distribution; (F) C19DP indel length distribution; (G) ENA frequency of mutation effect on the protein; (H) C19DP frequency of mutation effect on the protein. See Figure S4 for a screenshot including the filters.
Figure 5
Figure 5
Gene view for the spike protein on the raw read dataset. (A) Table of the top 20 recurrent mutations with the frequency segregated by month between November 2021 and July 2022; (B) gene view showing mutations (synonymous and unique mutations excluded) in the spike protein and their frequencies in the virus population, the ConsHMM conservation tracks in grey and the Pfam protein domains in tones of purple. See Figure S4 for a screenshot including the filters.
Figure 6
Figure 6
Distribution of VAF across all mutation calls (4,665,192 with VAF ≥ 0.8; 222,297 with VAF ≥ 0.5 and < 0.8; 26,231,409 with VAF < 0.5) in 135,347 samples. High-confidence clonal mutations overlapping the same amino acid are merged into MNVs or complex variants. See Figure S7 for a screenshot of intrahost mutations tab including the filters.

References

    1. Moorthy V.S., Karam G., Vannice K.S., Kieny M.-P. Rationale for WHO’s new position calling for prompt reporting and public disclosure of interventional clinical trial results. PLoS Med. 2015;12:e1001819. doi: 10.1371/journal.pmed.1001819. - DOI - PMC - PubMed
    1. Drosten C., Günther S., Preiser W., van der Werf S., Brodt H.-R., Becker S., Rabenau H., Panning M., Kolesnikova L., Fouchier R.A.M., et al. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1967–1976. doi: 10.1056/NEJMoa030747. - DOI - PubMed
    1. Ventura C.V., Maia M., Bravo-Filho V., Góis A.L., Belfort R. Zika virus in Brazil and macular atrophy in a child with microcephaly. Lancet. 2016;387:228. doi: 10.1016/S0140-6736(16)00006-4. - DOI - PubMed
    1. Zhu N., Zhang D., Wang W., Li X., Yang B., Song J., Zhao X., Huang B., Shi W., Lu R., et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N. Engl. J. Med. 2020;382:727–733. doi: 10.1056/NEJMoa2001017. - DOI - PMC - PubMed
    1. Wu F., Zhao S., Yu B., Chen Y.-M., Wang W., Song Z.-G., Hu Y., Tao Z.-W., Tian J.-H., Pei Y.-Y., et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579:265–269. doi: 10.1038/s41586-020-2008-3. - DOI - PMC - PubMed

Publication types

Substances

Supplementary concepts