Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 17;25(1):1188.
doi: 10.1186/s12903-025-06590-2.

Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data

Affiliations

Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data

Fangqiao Wei et al. BMC Oral Health. .

Abstract

Background: Oral microbiota is a major etiological factor in the development of dental caries. Next-generation sequencing techniques have been widely used, generating vast amounts of data which is underexplored. The advancement of artificial intelligence (AI) technologies has made it possible to mine information from these large datasets. This study aimed to develop AI-driven diagnostic models and identify key microbial features for caries.

Methods: We collected raw metagenomic and full-length 16 S rRNA gene sequencing data from previous studies on saliva and plaque to construct a caries AI training dataset comprising nearly 600 samples. Samples were grouped based on age, sequencing and sampling method. Through systematic comparison of seven machine learning architectures, including Logistic Regression, Random Forest, Support Vector Machines, Gradient Boosting, Convolutional Neural Networks, Feedforward Neural Networks, and Transformer models, we developed subgroup-specific caries diagnostic models, with subsequent ensemble learning integration to enhance generalizability.

Results: The caries diagnostic model achieved a maximum AUC value of 1 (accuracy of 100%) for children under 6 years old in both saliva and plaque groups. The consistency of top features (species and metabolic pathways) contributing to the models was demonstrated through intra- and inter-group analyses. Key caries-associated species included Streptococcus salivarius, Streptococcus parasanguinis and Veillonella dispar. Veillonella parvula exhibits higher abundance in caries plaque samples, while being elevated in healthy saliva samples. Metabolic pathways like geranylgeranyl diphosphate and fructan biosynthesis were enriched in caries, whereas Bifidobacterium shunt and peptidoglycan biosynthesis were depleted.

Conclusion: The current work provided reliable diagnostic models for early childhood caries, and established a robust computational framework for AI-driven microbiome analysis. This study, by focusing on the characteristics of the oral microbiome, offers novel perspectives for data mining and validation of existing data through the application of AI modelling.

Keywords: Artificial intelligence; Early childhood caries; Metagenomics; Modelling; Salivary diagnostics.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This study involved a secondary analysis of publicly available, de-identified metagenomic and full-length 16S rRNA gene sequencing data. Therefore, ethical approval from our institutional review board was not required for this specific study. We confirm that all original studies from which the data were sourced obtained the necessary ethical approvals and informed consent from participants prior to data deposition in public repositories. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests. Clinical trial number: Not applicable.

Figures

Fig. 1
Fig. 1
AI-assisted oral caries microbiome modeling and key feature selection strategy
Fig. 2
Fig. 2
The performance of ensemble learning model. ROC curves (a) and accuracy (b) for modeling at the species level using seven AI methods. (c) The routing system to select machine learning methods based on sample types
Fig. 3
Fig. 3
Abundance and feature weight distribution of overlapping species within groups. Box plots show the abundance of each species in the caries (red box) and healthy (blue box) groups. Data are presented as median ± interquartile range. Open circles represent mean values. Scatter plots display the feature weights and rankings of species, where the y-axis represents the ranking of feature contributions, and the size of the points reflects the normalized feature importance
Fig. 4
Fig. 4
Visualization of overlapping feature species identified among groups. (a) The Venn diagram illustrates the overlap of selected feature species across different groups, with unique species not displayed. (b) The heatmap of the overlapping bacterial species among sample groups. (c) Network diagrams showing correlations of P1 and S1 groups. The Spearman’s rho between two nodes (species) determines the color and thickness of each edge. Node size corresponds to the average abundance of the species
Fig. 5
Fig. 5
The metabolic pathways selected by ensemble learning model. (a) The chord diagrams show the contribution and overlap of selected feature metabolic pathways across different groups. (b) The violin plots display the abundance of overlapping pathways within groups. The orange and green distributions represent the caries and healthy groups, respectively. Data are presented as median ± interquartile range. The error bars indicating 1.5 times the IQR to highlight potential outliers. (c) Box plots show the abundance of each pathway in the caries (orange box) and healthy (green box) groups. Data are presented as median ± interquartile range. Open circles represent mean values. (d) Taxonomic partitioning of fructan biosynthesis potential (PWY-822) across caries phenotypes in Group S1

Similar articles

References

    1. Bawaskar HS, Bawaskar PH. Oral diseases: a global public health challenge. Lancet (London England). 2020;395(10219):185–6. - PubMed
    1. Pitts NB, Zero DT, Marsh PD, Ekstrand K, Weintraub JA, Ramos-Gomez F, et al. Dental caries. Nat Reviews Disease Primers. 2017;3:17030. - PubMed
    1. Freire M, Nelson KE, Edlund A. The oral host-microbial interactome: an ecological chronometer of health? Trends Microbiol. 2021;29(6):551–61. - PubMed
    1. Simón-Soro A, Mira A. Solving the etiology of dental caries. Trends Microbiol. 2015;23(2):76–82. - PubMed
    1. Johnson JS, Spakowicz DJ, Hong BY, Petersen LM, Demkowicz P, Chen L, et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat Commun. 2019;10(1):5029. - PMC - PubMed

Substances

LinkOut - more resources