This is a preprint.
Craft: A Machine Learning Approach to Dengue Subtyping
- PMID: 39990353
- PMCID: PMC11844389
- DOI: 10.1101/2025.02.10.637410
Craft: A Machine Learning Approach to Dengue Subtyping
Update in
-
Craft: a machine learning approach to dengue subtyping.Bioinform Adv. 2025 Oct 6;5(1):vbaf224. doi: 10.1093/bioadv/vbaf224. eCollection 2025. Bioinform Adv. 2025. PMID: 41103542 Free PMC article.
Abstract
Motivation: The dengue virus poses a major global health threat, with nearly 390 million infections annually. A recently proposed hierarchical dengue nomenclature system enhances spatial resolution by defining major and minor lineages within genotypes, aiding efforts to track viral evolution. While current subtyping tools - Genome Detective, GLUE, and NextClade - rely on computationally intensive sequence alignment and phylogenetic inference, machine learning presents a promising alternative for achieving accurate and rapid classification.
Results: We present Craft (Chaos Random Forest), a machine learning framework for dengue subtyping. We demonstrate that Craft is capable of faster classification speeds while matching or surpassing the accuracy of existing tools. Craft achieves 99.5% accuracy on a hold-out test set and processes over 140 000 sequences per minute. Notably, Craft maintains remarkably high accuracy even when classifying sequence segments as short as 700 nucleotides.
Figures
References
-
- Aksamentov I., Roemer C., Hodcroft E., and Neher R. (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67):3773.
-
- Altschul S. et al. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410. - PubMed
-
- Bonidia R. et al. (2021). Feature extraction approaches for biological sequences: a comparative study of mathematical features. Briefings in Bioinformatics, 22(5):bbab011. - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources