Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data
- PMID: 40676575
- PMCID: PMC12272970
- DOI: 10.1186/s12903-025-06590-2
Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data
Abstract
Background: Oral microbiota is a major etiological factor in the development of dental caries. Next-generation sequencing techniques have been widely used, generating vast amounts of data which is underexplored. The advancement of artificial intelligence (AI) technologies has made it possible to mine information from these large datasets. This study aimed to develop AI-driven diagnostic models and identify key microbial features for caries.
Methods: We collected raw metagenomic and full-length 16 S rRNA gene sequencing data from previous studies on saliva and plaque to construct a caries AI training dataset comprising nearly 600 samples. Samples were grouped based on age, sequencing and sampling method. Through systematic comparison of seven machine learning architectures, including Logistic Regression, Random Forest, Support Vector Machines, Gradient Boosting, Convolutional Neural Networks, Feedforward Neural Networks, and Transformer models, we developed subgroup-specific caries diagnostic models, with subsequent ensemble learning integration to enhance generalizability.
Results: The caries diagnostic model achieved a maximum AUC value of 1 (accuracy of 100%) for children under 6 years old in both saliva and plaque groups. The consistency of top features (species and metabolic pathways) contributing to the models was demonstrated through intra- and inter-group analyses. Key caries-associated species included Streptococcus salivarius, Streptococcus parasanguinis and Veillonella dispar. Veillonella parvula exhibits higher abundance in caries plaque samples, while being elevated in healthy saliva samples. Metabolic pathways like geranylgeranyl diphosphate and fructan biosynthesis were enriched in caries, whereas Bifidobacterium shunt and peptidoglycan biosynthesis were depleted.
Conclusion: The current work provided reliable diagnostic models for early childhood caries, and established a robust computational framework for AI-driven microbiome analysis. This study, by focusing on the characteristics of the oral microbiome, offers novel perspectives for data mining and validation of existing data through the application of AI modelling.
Keywords: Artificial intelligence; Early childhood caries; Metagenomics; Modelling; Salivary diagnostics.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: This study involved a secondary analysis of publicly available, de-identified metagenomic and full-length 16S rRNA gene sequencing data. Therefore, ethical approval from our institutional review board was not required for this specific study. We confirm that all original studies from which the data were sourced obtained the necessary ethical approvals and informed consent from participants prior to data deposition in public repositories. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests. Clinical trial number: Not applicable.
Figures





Similar articles
-
Novel potential biomarkers for predicting childhood caries via metagenomic analysis.Front Cell Infect Microbiol. 2025 Jun 17;15:1522970. doi: 10.3389/fcimb.2025.1522970. eCollection 2025. Front Cell Infect Microbiol. 2025. PMID: 40599650 Free PMC article.
-
Shotgun Metagenomics Identifies in a Cross-Sectional Setting Improved Plaque Microbiome Biomarkers for Peri-Implant Diseases.J Clin Periodontol. 2025 Jul;52(7):999-1010. doi: 10.1111/jcpe.14121. Epub 2025 Jun 4. J Clin Periodontol. 2025. PMID: 40467108 Free PMC article.
-
Uses of Different Machine Learning Algorithms for Diagnosis of Dental Caries.J Healthc Eng. 2022 Mar 31;2022:5032435. doi: 10.1155/2022/5032435. eCollection 2022. J Healthc Eng. 2022. PMID: 35399834 Free PMC article.
-
Primary school-based behavioural interventions for preventing caries.Cochrane Database Syst Rev. 2013 May 31;2013(5):CD009378. doi: 10.1002/14651858.CD009378.pub2. Cochrane Database Syst Rev. 2013. PMID: 23728691 Free PMC article.
-
Diagnosis and management of dental caries throughout life.NIH Consens Statement. 2001 Mar 26-28;18(1):1-23. NIH Consens Statement. 2001. PMID: 11699634
References
-
- Bawaskar HS, Bawaskar PH. Oral diseases: a global public health challenge. Lancet (London England). 2020;395(10219):185–6. - PubMed
-
- Pitts NB, Zero DT, Marsh PD, Ekstrand K, Weintraub JA, Ramos-Gomez F, et al. Dental caries. Nat Reviews Disease Primers. 2017;3:17030. - PubMed
-
- Freire M, Nelson KE, Edlund A. The oral host-microbial interactome: an ecological chronometer of health? Trends Microbiol. 2021;29(6):551–61. - PubMed
-
- Simón-Soro A, Mira A. Solving the etiology of dental caries. Trends Microbiol. 2015;23(2):76–82. - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical