Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 5;25(1):231.
doi: 10.1186/s12859-024-05754-1.

Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family

Affiliations

Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family

Karolayne S Azevedo et al. BMC Bioinformatics. .

Abstract

Purpose: In this study, we present DeepVirusClassifier, a tool capable of accurately classifying Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) viral sequences among other subtypes of the coronaviridae family. This classification is achieved through a deep neural network model that relies on convolutional neural networks (CNNs). Since viruses within the same family share similar genetic and structural characteristics, the classification process becomes more challenging, necessitating more robust models. With the rapid evolution of viral genomes and the increasing need for timely classification, we aimed to provide a robust and efficient tool that could increase the accuracy of viral identification and classification processes. Contribute to advancing research in viral genomics and assist in surveilling emerging viral strains.

Methods: Based on a one-dimensional deep CNN, the proposed tool is capable of training and testing on the Coronaviridae family, including SARS-CoV-2. Our model's performance was assessed using various metrics, including F1-score and AUROC. Additionally, artificial mutation tests were conducted to evaluate the model's generalization ability across sequence variations. We also used the BLAST algorithm and conducted comprehensive processing time analyses for comparison.

Results: DeepVirusClassifier demonstrated exceptional performance across several evaluation metrics in the training and testing phases. Indicating its robust learning capacity. Notably, during testing on more than 10,000 viral sequences, the model exhibited a more than 99% sensitivity for sequences with fewer than 2000 mutations. The tool achieves superior accuracy and significantly reduced processing times compared to the Basic Local Alignment Search Tool algorithm. Furthermore, the results appear more reliable than the work discussed in the text, indicating that the tool has great potential to revolutionize viral genomic research.

Conclusion: DeepVirusClassifier is a powerful tool for accurately classifying viral sequences, specifically focusing on SARS-CoV-2 and other subtypes within the Coronaviridae family. The superiority of our model becomes evident through rigorous evaluation and comparison with existing methods. Introducing artificial mutations into the sequences demonstrates the tool's ability to identify variations and significantly contributes to viral classification and genomic research. As viral surveillance becomes increasingly critical, our model holds promise in aiding rapid and accurate identification of emerging viral strains.

Keywords: Coronaviridae; Deep learning; SARS-CoV-2; Viral classification.

PubMed Disclaimer

Conflict of interest statement

No competing interest is declared.

Figures

Fig. 1
Fig. 1
Confusion matrix of the proposed approach for the classification problem of distinguishing between SARS-CoV-2 and Non-SARS-CoV-2 samples. Non-SARS-CoV-2 samples are represented by label 0, and SARS-CoV-2 samples are represented by label 1. The model is capable of correctly classifying all samples according to their respective classes
Fig. 2
Fig. 2
AUROC curve for classification of SARS-CoV-2 and Non SARS-CoV-2
Fig. 3
Fig. 3
The learning curve of training and validation accuracy of the training set using fivefold cross-validation
Fig. 4
Fig. 4
The learning curve of training and validation loss of the training set using fivefold cross-validation
Fig. 5
Fig. 5
Overview of the proposed technique
Fig. 6
Fig. 6
Countries that contain genomic samples of the coronaviridae family in the database
Fig. 7
Fig. 7
Dataset of all viral subtypes after the data balancing process
Fig. 8
Fig. 8
Dataset after balancing the samples according to their groups
Fig. 9
Fig. 9
CNN used for the viral classifier proposal presented in this work
Fig. 10
Fig. 10
Comparison of the correctness rate between BLAST and CNN (proposed in this work) for a test set of 34 sequences according to the increase of the artificial position mutation rate, γ

Similar articles

Cited by

References

    1. Wang H, et al. The genetic sequence, origin, and diagnosis of SARS-CoV-2. Eur J Clin Microbiol Infect Dis. 2020;39:1–7. doi: 10.1007/s10096-020-03899-4. - DOI - PMC - PubMed
    1. Maghdid HS, Ghafoor KZ, Sadiq AS, Curran K, Rabie K. A novel AI-enabled framework to diagnose coronavirus COVID 19 using smartphone embedded sensors: design study; 2020. arXiv:2003.07434.
    1. Chowdhury MEH, et al. Can AI help in screening viral and COVID-19 pneumonia? IEEE Access. 2020;8:132665–132676. doi: 10.1109/ACCESS.2020.3010287. - DOI
    1. Toyoshima Y, Nemoto K, Matsumoto S, Nakamura Y, Kiyotani K. SARS-CoV-2 genomic variations associated with mortality rate of COVID-19. J Hum Genet. 2020;65:1075–1082. doi: 10.1038/s10038-020-0808-9. - DOI - PMC - PubMed
    1. Remita MA, et al. A machine learning approach for viral genome classification. BMC Bioinform. 2017;18:1–11. doi: 10.1186/s12859-017-1602-3. - DOI - PMC - PubMed