Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Feb 13:2025.02.10.637410.
doi: 10.1101/2025.02.10.637410.

Craft: A Machine Learning Approach to Dengue Subtyping

Affiliations

Craft: A Machine Learning Approach to Dengue Subtyping

Daniel J van Zyl et al. bioRxiv. .

Update in

  • Craft: a machine learning approach to dengue subtyping.
    van Zyl DJ, Dunaiski M, Tegally H, Baxter C, de Oliveira T, Xavier JS; INFORM Africa Research Study Group. van Zyl DJ, et al. Bioinform Adv. 2025 Oct 6;5(1):vbaf224. doi: 10.1093/bioadv/vbaf224. eCollection 2025. Bioinform Adv. 2025. PMID: 41103542 Free PMC article.

Abstract

Motivation: The dengue virus poses a major global health threat, with nearly 390 million infections annually. A recently proposed hierarchical dengue nomenclature system enhances spatial resolution by defining major and minor lineages within genotypes, aiding efforts to track viral evolution. While current subtyping tools - Genome Detective, GLUE, and NextClade - rely on computationally intensive sequence alignment and phylogenetic inference, machine learning presents a promising alternative for achieving accurate and rapid classification.

Results: We present Craft (Chaos Random Forest), a machine learning framework for dengue subtyping. We demonstrate that Craft is capable of faster classification speeds while matching or surpassing the accuracy of existing tools. Craft achieves 99.5% accuracy on a hold-out test set and processes over 140 000 sequences per minute. Notably, Craft maintains remarkably high accuracy even when classifying sequence segments as short as 700 nucleotides.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1:
Composite radar-chart of class-wise model performance. Classes are colored according to serotype and genotype. The blue, red and green dotted lines represent Craft, NextClade and Genome Detective respectively, with lines closer to the perimeter indicating better performance. The length of each bar corresponds to the evolutionary depth of the lineage.
Fig. 2:
Fig. 2:
A collection of line plots showing the accuracy of each model when tested on short genomic segments from various positions within the dengue genome. Each plot corresponds to a particular serotype. We include horizontal bars indicating the positions of each gene region for each serotype.

References

    1. Aksamentov I., Roemer C., Hodcroft E., and Neher R. (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67):3773.
    1. Altschul S. et al. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410. - PubMed
    1. Bhatt S. et al. (2013). The global distribution and burden of dengue. Nature, 496(7446):504–507. - PMC - PubMed
    1. Bonidia R. et al. (2021). Feature extraction approaches for biological sequences: a comparative study of mathematical features. Briefings in Bioinformatics, 22(5):bbab011. - PubMed
    1. Cacciabue M., Aguilera P., Gismondi M., and Taboga O. (2022). Covidex: An ultrafast and accurate tool for sars-cov-2 subtyping. Infection, Genetics and Evolution, 99:105261. - PMC - PubMed

Publication types

LinkOut - more resources