Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr:99:105261.
doi: 10.1016/j.meegid.2022.105261. Epub 2022 Feb 26.

Covidex: An ultrafast and accurate tool for SARS-CoV-2 subtyping

Affiliations

Covidex: An ultrafast and accurate tool for SARS-CoV-2 subtyping

Marco Cacciabue et al. Infect Genet Evol. 2022 Apr.

Abstract

The epidemiological surveillance of SARS-CoV-2 by means of whole-genome sequencing has revealed the emergence and co-existence of multiple viral lineages or subtypes throughout the world. Moreover, it has been shown that several subtypes of this virus display particular phenotypes, such as increased transmissibility or reduced susceptibility to neutralizing antibodies, leading to the denomination of Variants of Interest (VOI) or Variants of Concern (VOC). Thus, subtyping of SARS-CoV-2 is a crucial step for the surveillance of this pathogen. Here, we present Covidex, an open-source, alignment-free machine learning subtyping tool. It is a shiny web app that allows an ultra-fast and accurate classification of SARS-CoV-2 genome sequences into the three most used nomenclature systems (GISAID, Nextstrain, Pango lineages). It also categorizes input sequences as VOI or VOC, according to current definitions. The program is cross-platform compatible and it is available via Source-Forge https://sourceforge.net/projects/covidex or via the web application http://covidex.unlu.edu.ar.

Keywords: Machine learning; SARS-CoV-2; Subtyping; VOC; VOI; Web-application.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests to declare.

Figures

Fig. 1
Fig. 1
Workflow overview of Covidex. First, viral sequences are loaded in FASTA format. Next, normalized k-mer counts are obtained from these sequences. Three random forest models are then used to classify the query sequences and probability scores based on the number of trees that calls for each class are calculated. Finally, the classification results are presented and a report can be generated for download.
Fig. 2
Fig. 2
Overview of the Covidex app. The user is expected to load a sequence file and press RUN. A results table will be shown. Additionally, the user can download an automatic report.
Supplementary Fig. 1
Supplementary Fig. 1
Accuracy score and running time for the random forest algorithm, at different values of k, for a set of 8000 whole SARS-CoV-2 genomes. The black arrow shows the chosen k (highest accuracy with an overall low time).
Supplementary Fig. 2
Supplementary Fig. 2
VOC and VOI variants detection. Covidex performance on detecting variants of relevance was analyzed. For each VOC and VOI category a sample dataset was created by downloading from GISAID database (with high coverage and complete filters on). Sequences: number of sequences in the dataset; Accuracy: percentage of correctly labeled variants.

References

    1. Chang W., Cheng J., Allaire J.J., Xie Y., McPherson J. 2020. shiny: Web Application Framework for R.
    1. Clade Naming for SARS-CoV-2 . 2021. Year-letter Genetic Clade Naming for SARS-CoV-2 on Nextstrain.org. (URL (accessed 3.22.20))
    1. Coronaviridae Study Group of the International Committee on Taxonomy of Viruses The species severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat. Microbiol. 2020;5:536–544. doi: 10.1038/s41564-020-0695-z. - DOI - PMC - PubMed
    1. Elbe S., Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health: data, disease and diplomacy. Global Chall. 2017;1:33–46. doi: 10.1002/gch2.1018. - DOI - PMC - PubMed
    1. GISAID . Clade and Lineage Nomenclature. 2020. Clade and lineage nomenclature.https://www.gisaid.org/references/statements-clarifications/clade-and-li... (accessed 3.22.21)

Publication types