. 2024 Jul 5;25(1):231.

doi: 10.1186/s12859-024-05754-1.

Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family

Karolayne S Azevedo¹, Luísa C de Souza¹, Maria G F Coutinho¹, Raquel de M Barbosa^{2

3}, Marcelo A C Fernandes^{4

5

6}

Affiliations

¹ InovAI Lab, nPITI/IMD, Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil.
² InovAI Lab, nPITI/IMD, Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil. rdemelo@us.es.
³ Department of Pharmacy and Pharmaceutical Technology, University of Seville, 41012, Seville, Spain. rdemelo@us.es.
⁴ InovAI Lab, nPITI/IMD, Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil. mfernandes@dca.ufrn.br.
⁵ Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil. mfernandes@dca.ufrn.br.
⁶ Department of Computer Engineering and Automation (DCA), Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil. mfernandes@dca.ufrn.br.

PMID: 38969970
PMCID: PMC11225326
DOI: 10.1186/s12859-024-05754-1

Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family

Karolayne S Azevedo et al. BMC Bioinformatics. 2024.

. 2024 Jul 5;25(1):231.

doi: 10.1186/s12859-024-05754-1.

Authors

Karolayne S Azevedo¹, Luísa C de Souza¹, Maria G F Coutinho¹, Raquel de M Barbosa^{2

3}, Marcelo A C Fernandes^{4

5

6}

Affiliations

¹ InovAI Lab, nPITI/IMD, Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil.
² InovAI Lab, nPITI/IMD, Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil. rdemelo@us.es.
³ Department of Pharmacy and Pharmaceutical Technology, University of Seville, 41012, Seville, Spain. rdemelo@us.es.
⁴ InovAI Lab, nPITI/IMD, Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil. mfernandes@dca.ufrn.br.
⁵ Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil. mfernandes@dca.ufrn.br.
⁶ Department of Computer Engineering and Automation (DCA), Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil. mfernandes@dca.ufrn.br.

PMID: 38969970
PMCID: PMC11225326
DOI: 10.1186/s12859-024-05754-1

Abstract

Purpose: In this study, we present DeepVirusClassifier, a tool capable of accurately classifying Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) viral sequences among other subtypes of the coronaviridae family. This classification is achieved through a deep neural network model that relies on convolutional neural networks (CNNs). Since viruses within the same family share similar genetic and structural characteristics, the classification process becomes more challenging, necessitating more robust models. With the rapid evolution of viral genomes and the increasing need for timely classification, we aimed to provide a robust and efficient tool that could increase the accuracy of viral identification and classification processes. Contribute to advancing research in viral genomics and assist in surveilling emerging viral strains.

Methods: Based on a one-dimensional deep CNN, the proposed tool is capable of training and testing on the Coronaviridae family, including SARS-CoV-2. Our model's performance was assessed using various metrics, including F1-score and AUROC. Additionally, artificial mutation tests were conducted to evaluate the model's generalization ability across sequence variations. We also used the BLAST algorithm and conducted comprehensive processing time analyses for comparison.

Results: DeepVirusClassifier demonstrated exceptional performance across several evaluation metrics in the training and testing phases. Indicating its robust learning capacity. Notably, during testing on more than 10,000 viral sequences, the model exhibited a more than 99% sensitivity for sequences with fewer than 2000 mutations. The tool achieves superior accuracy and significantly reduced processing times compared to the Basic Local Alignment Search Tool algorithm. Furthermore, the results appear more reliable than the work discussed in the text, indicating that the tool has great potential to revolutionize viral genomic research.

Conclusion: DeepVirusClassifier is a powerful tool for accurately classifying viral sequences, specifically focusing on SARS-CoV-2 and other subtypes within the Coronaviridae family. The superiority of our model becomes evident through rigorous evaluation and comparison with existing methods. Introducing artificial mutations into the sequences demonstrates the tool's ability to identify variations and significantly contributes to viral classification and genomic research. As viral surveillance becomes increasingly critical, our model holds promise in aiding rapid and accurate identification of emerging viral strains.

Keywords: Coronaviridae; Deep learning; SARS-CoV-2; Viral classification.

PubMed Disclaimer

Conflict of interest statement

No competing interest is declared.

Figures

**Fig. 1**
Confusion matrix of the proposed approach for the classification problem of distinguishing between SARS-CoV-2 and Non-SARS-CoV-2 samples. Non-SARS-CoV-2 samples are represented by label 0, and SARS-CoV-2 samples are represented by label 1. The model is capable of correctly classifying all samples according to their respective classes

**Fig. 2**
AUROC curve for classification of SARS-CoV-2 and Non SARS-CoV-2

**Fig. 3**
The learning curve of training and validation accuracy of the training set using fivefold cross-validation

**Fig. 4**
The learning curve of training and validation loss of the training set using fivefold cross-validation

**Fig. 5**
Overview of the proposed technique

**Fig. 6**
Countries that contain genomic samples of the coronaviridae family in the database

**Fig. 7**
Dataset of all viral subtypes after the data balancing process

**Fig. 8**
Dataset after balancing the samples according to their groups

**Fig. 9**
CNN used for the viral classifier proposal presented in this work

**Fig. 10**
Comparison of the correctness rate between BLAST and CNN (proposed in this work) for a test set of 34 sequences according to the increase of the artificial position mutation rate, $γ$

See this image and copyright information in PMC

Cited by

PRCFX-DT: a new graph-based approach for feature selection and classification of genomic sequences.
Khodaei A, Eskandari S, Sharifi H, Mozaffari-Tazehkand B. Khodaei A, et al. BMC Bioinformatics. 2025 Jun 17;26(1):159. doi: 10.1186/s12859-025-06183-4. BMC Bioinformatics. 2025. PMID: 40528202 Free PMC article.
Artificial intelligence and machine learning in the development of vaccines and immunotherapeutics-yesterday, today, and tomorrow.
Elfatimi E, Lekbach Y, Prakash S, BenMohamed L. Elfatimi E, et al. Front Artif Intell. 2025 Jul 18;8:1620572. doi: 10.3389/frai.2025.1620572. eCollection 2025. Front Artif Intell. 2025. PMID: 40756816 Free PMC article. Review.

References

1. Wang H, et al. The genetic sequence, origin, and diagnosis of SARS-CoV-2. Eur J Clin Microbiol Infect Dis. 2020;39:1–7. doi: 10.1007/s10096-020-03899-4. - DOI - PMC - PubMed
1. Maghdid HS, Ghafoor KZ, Sadiq AS, Curran K, Rabie K. A novel AI-enabled framework to diagnose coronavirus COVID 19 using smartphone embedded sensors: design study; 2020. arXiv:2003.07434.
1. Chowdhury MEH, et al. Can AI help in screening viral and COVID-19 pneumonia? IEEE Access. 2020;8:132665–132676. doi: 10.1109/ACCESS.2020.3010287. - DOI
1. Toyoshima Y, Nemoto K, Matsumoto S, Nakamura Y, Kiyotani K. SARS-CoV-2 genomic variations associated with mortality rate of COVID-19. J Hum Genet. 2020;65:1075–1082. doi: 10.1038/s10038-020-0808-9. - DOI - PMC - PubMed
1. Remita MA, et al. A machine learning approach for viral genome classification. BMC Bioinform. 2017;18:1–11. doi: 10.1186/s12859-017-1602-3. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family

Affiliations

Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous