This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Feb 13:2025.02.10.637410.

doi: 10.1101/2025.02.10.637410.

Craft: A Machine Learning Approach to Dengue Subtyping

Daniel J van Zyl^{1

2}, Marcel Dunaiski², Houriiyah Tegally¹, Cheryl Baxter^{1

3}; INFORM Africa research study group; Tulio de Oliveira^{1

3

4

5}, Joicymara S Xavier^{1

6

7}

Affiliations

¹ Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch University,South Africa.
² Computer Science Division, Department of Mathematical Sciences, Faculty of Science, Stellenbosch University, Stellenbosch, South Africa.
³ Centre for the AIDS Programme of Research in South Africa (CAPRISA), Durban, South Africa.
⁴ KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), Nelson R Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa.
⁵ Department of Global Health, University of Washington; Seattle, USA.
⁶ Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri (UFVJM), Unaí, Brazil.
⁷ Institute of Biological Sciences, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil.

PMID: 39990353
PMCID: PMC11844389
DOI: 10.1101/2025.02.10.637410

Craft: A Machine Learning Approach to Dengue Subtyping

Daniel J van Zyl et al. bioRxiv. 2025.

[Preprint]. 2025 Feb 13:2025.02.10.637410.

doi: 10.1101/2025.02.10.637410.

Authors

Daniel J van Zyl^{1

2}, Marcel Dunaiski², Houriiyah Tegally¹, Cheryl Baxter^{1

3}; INFORM Africa research study group; Tulio de Oliveira^{1

3

4

5}, Joicymara S Xavier^{1

6

7}

Affiliations

¹ Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch University,South Africa.
² Computer Science Division, Department of Mathematical Sciences, Faculty of Science, Stellenbosch University, Stellenbosch, South Africa.
³ Centre for the AIDS Programme of Research in South Africa (CAPRISA), Durban, South Africa.
⁴ KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), Nelson R Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa.
⁵ Department of Global Health, University of Washington; Seattle, USA.
⁶ Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri (UFVJM), Unaí, Brazil.
⁷ Institute of Biological Sciences, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil.

PMID: 39990353
PMCID: PMC11844389
DOI: 10.1101/2025.02.10.637410

Update in

Craft: a machine learning approach to dengue subtyping.
van Zyl DJ, Dunaiski M, Tegally H, Baxter C, de Oliveira T, Xavier JS; INFORM Africa Research Study Group. van Zyl DJ, et al. Bioinform Adv. 2025 Oct 6;5(1):vbaf224. doi: 10.1093/bioadv/vbaf224. eCollection 2025. Bioinform Adv. 2025. PMID: 41103542 Free PMC article.

Abstract

Motivation: The dengue virus poses a major global health threat, with nearly 390 million infections annually. A recently proposed hierarchical dengue nomenclature system enhances spatial resolution by defining major and minor lineages within genotypes, aiding efforts to track viral evolution. While current subtyping tools - Genome Detective, GLUE, and NextClade - rely on computationally intensive sequence alignment and phylogenetic inference, machine learning presents a promising alternative for achieving accurate and rapid classification.

Results: We present Craft (Chaos Random Forest), a machine learning framework for dengue subtyping. We demonstrate that Craft is capable of faster classification speeds while matching or surpassing the accuracy of existing tools. Craft achieves 99.5% accuracy on a hold-out test set and processes over 140 000 sequences per minute. Notably, Craft maintains remarkably high accuracy even when classifying sequence segments as short as 700 nucleotides.

PubMed Disclaimer

Figures

**Fig. 1:**
Composite radar-chart of class-wise model performance. Classes are colored according to serotype and genotype. The blue, red and green dotted lines represent Craft, NextClade and Genome Detective respectively, with lines closer to the perimeter indicating better performance. The length of each bar corresponds to the evolutionary depth of the lineage.

**Fig. 2:**
A collection of line plots showing the accuracy of each model when tested on short genomic segments from various positions within the dengue genome. Each plot corresponds to a particular serotype. We include horizontal bars indicating the positions of each gene region for each serotype.

See this image and copyright information in PMC

References

1. Aksamentov I., Roemer C., Hodcroft E., and Neher R. (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67):3773.
1. Altschul S. et al. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410. - PubMed
1. Bhatt S. et al. (2013). The global distribution and burden of dengue. Nature, 496(7446):504–507. - PMC - PubMed
1. Bonidia R. et al. (2021). Feature extraction approaches for biological sequences: a comparative study of mathematical features. Briefings in Bioinformatics, 22(5):bbab011. - PubMed
1. Cacciabue M., Aguilera P., Gismondi M., and Taboga O. (2022). Covidex: An ultrafast and accurate tool for sars-cov-2 subtyping. Infection, Genetics and Evolution, 99:105261. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Craft: A Machine Learning Approach to Dengue Subtyping

Affiliations

Craft: A Machine Learning Approach to Dengue Subtyping

Authors

Affiliations

Update in

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources