Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 2;22(10):e22299.
doi: 10.2196/22299.

CoV-Seq, a New Tool for SARS-CoV-2 Genome Analysis and Visualization: Development and Usability Study

Affiliations

CoV-Seq, a New Tool for SARS-CoV-2 Genome Analysis and Visualization: Development and Usability Study

Boxiang Liu et al. J Med Internet Res. .

Abstract

Background: COVID-19 became a global pandemic not long after its identification in late 2019. The genomes of SARS-CoV-2 are being rapidly sequenced and shared on public repositories. To keep up with these updates, scientists need to frequently refresh and reclean data sets, which is an ad hoc and labor-intensive process. Further, scientists with limited bioinformatics or programming knowledge may find it difficult to analyze SARS-CoV-2 genomes.

Objective: To address these challenges, we developed CoV-Seq, an integrated web server that enables simple and rapid analysis of SARS-CoV-2 genomes.

Methods: CoV-Seq is implemented in Python and JavaScript. The web server and source code URLs are provided in this article.

Results: Given a new sequence, CoV-Seq automatically predicts gene boundaries and identifies genetic variants, which are displayed in an interactive genome visualizer and are downloadable for further analysis. A command-line interface is available for high-throughput processing. In addition, we aggregated all publicly available SARS-CoV-2 sequences from the Global Initiative on Sharing Avian Influenza Data (GISAID), National Center for Biotechnology Information (NCBI), European Nucleotide Archive (ENA), and China National GeneBank (CNGB), and extracted genetic variants from these sequences for download and downstream analysis. The CoV-Seq database is updated weekly.

Conclusions: We have developed CoV-Seq, an integrated web service for fast and easy analysis of custom SARS-CoV-2 sequences. The web server provides an interactive module for the analysis of custom sequences and a weekly updated database of genetic variants of all publicly accessible SARS-CoV-2 sequences. We believe CoV-Seq will help improve our understanding of the genetic underpinnings of COVID-19.

Keywords: COVID-19; SARS-CoV-2; bioinformatics; data sets; genetics; genome; programming; sequence; virus; web server.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
The cumulative number of sequences hosted by public databases has increased over time. Sequences from Asia increased steadily since January, whereas sequences from other continents saw a dramatic increase in March. Each data point represents a week. Note that the x-axis shows collection dates, which can precede submission dates by several weeks (eg, a sequence collected in July may be submitted in August). Therefore, lines that do not reach August indicate that sequences recently collected have not been submitted yet.
Figure 2
Figure 2
The number of submissions by country. The ten countries with the highest numbers of submissions are marked.
Figure 3
Figure 3
The CoV-Seq pipeline and web interface. (A) Genomic sequences are collected from GISAID, NCBI, ENA, and CNGB. We remove incomplete genomes (length <25,000 nucleotides) and duplicate genomes before alignment with MAFFT against the reference genome NC_045512.2. We use a custom Python script to generate raw variant calls and remove samples with too many mutations, indicative of sequencing error. After merging VCFs, we remove multiallelic sites and variants with the poly-A tail for a filtered set of variants. (B) The interactive genome visualizer shows ORFs (turquoise) and mutations (red). Users can zoom with the top bar and pan with the bottom bar. Hovering over ORF bodies and mutations will trigger pop-up windows for relevant information. (C) The mutation table shows positions, alleles, and intersecting ORFs. (D) The ORF table shows predicted ORF boundaries and supporting information. CNGB: China National GeneBank; ENA: European Nucleotide Archive; GISAID: Global Initiative on Sharing Avian Influenza Data; MAFFT: Multiple Alignment using Fast Fourier Transform; NCBI: National Center for Biotechnology Information; ORF: open reading frame; VCF: variant call format.

Similar articles

Cited by

References

    1. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020 May;20(5):533–534. doi: 10.1016/S1473-3099(20)30120-1. http://europepmc.org/abstract/MED/32087114 - DOI - PMC - PubMed
    1. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher RA. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018 Dec 01;34(23):4121–4123. doi: 10.1093/bioinformatics/bty407. http://europepmc.org/abstract/MED/29790939 - DOI - PMC - PubMed
    1. Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID's innovative contribution to global health. Glob Chall. 2017 Jan;1(1):33–46. doi: 10.1002/gch2.1018. doi: 10.1002/gch2.1018. - DOI - DOI - PMC - PubMed
    1. Brister JR, Ako-Adjei D, Bao Y, Blinkova O. NCBI viral genomes resource. Nucleic Acids Res. 2015 Jan;43(Database issue):D571–7. doi: 10.1093/nar/gku1207. http://europepmc.org/abstract/MED/25428358 - DOI - PMC - PubMed
    1. Kanz C, Aldebert P, Althorpe N, Baker W, Baldwin A, Bates K, Browne P, van den Broek A, Castro M, Cochrane G, Duggan K, Eberhardt R, Faruque N, Gamble J, Diez FG, Harte N, Kulikova T, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silventoinen V, Sobhany S, Stoehr P, Tuli MA, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R. The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 2005 Jan 01;33(Database issue):D29–33. doi: 10.1093/nar/gki098. http://europepmc.org/abstract/MED/15608199 - DOI - PMC - PubMed