Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun;100(6):599-607.
doi: 10.1177/0022034520979926. Epub 2020 Dec 24.

Oral Microbiota Composition Predicts Early Childhood Caries Onset

Affiliations

Oral Microbiota Composition Predicts Early Childhood Caries Onset

A Grier et al. J Dent Res. 2021 Jun.

Abstract

As the most common chronic disease in preschool children in the United States, early childhood caries (ECC) has a profound impact on a child's quality of life, represents a tremendous human and economic burden to society, and disproportionately affects those living in poverty. Caries risk assessment (CRA) is a critical component of ECC management, yet the accuracy, consistency, reproducibility, and longitudinal validation of the available risk assessment techniques are lacking. Molecular and microbial biomarkers represent a potential source for accurate and reliable dental caries risk and onset. Next-generation nucleotide-sequencing technology has made it feasible to profile the composition of the oral microbiota. In the present study, 16S ribosomal RNA (rRNA) gene sequencing was applied to saliva samples that were collected at 6-mo intervals for 24 mo from a subset of 56 initially caries-free children from an ongoing cohort of 189 children, aged 1 to 3 y, over the 2-y study period; 36 children developed ECC and 20 remained caries free. Analyses from machine learning models of microbiota composition, across the study period, distinguished between affected and nonaffected groups at the time of their initial study visits with an area under the receiver operating characteristic curve (AUC) of 0.71 and discriminated ECC-converted from healthy controls at the visit immediately preceding ECC diagnosis with an AUC of 0.89, as assessed by nested cross-validation. Rothia mucilaginosa, Streptococcus sp., and Veillonella parvula were selected as important discriminatory features in all models and represent biomarkers of risk for ECC onset. These findings indicate that oral microbiota as profiled by high-throughput 16S rRNA gene sequencing is predictive of ECC onset.

Keywords: 16S rRNA; biomarkers; dental caries; machine learning; receiver operating characteristic curve; risk assessment.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Study overview. (A) Sampling scheme overview. Saliva samples were collected at enrollment and every 6 mo thereafter for the duration of the 24-mo study period. Samples were grouped into precaries, caries active, and caries-free healthy controls. (B) Summary of analysis pipeline. DNA was extracted from saliva samples, and the V1 to V3 hypervariable region of the 16S ribosomal RNA gene was amplified and sequenced to profile the composition of the oral microbiota. After quality control and taxonomic assignments, profiles of microbiota composition were used to train machine learning models to predict caries risk.
Figure 2.
Figure 2.
Sample composition and diversity. (A) Heatmap showing the relative abundance of the 12 most abundant genre across all samples. Individual samples (thin columns) are grouped based on caries status (broad columns). The genre shown account for 94.2% of the total overall composition of all samples. (B) Principal coordinate analysis (PCoA) plot based on the weighted Unifrac distance metric, with samples colored by caries status. The weighted Unifrac distance is a metric of the dissimilarity between the composition of 2 microbiota samples. The PCoA plot is a summary representation of the overall structure of the data and the similarity/dissimilarity relationships between samples. There is no significant clustering or separation based on caries status (P = 0.18, PERMANOVA). (C) Boxplots of the α diversity of the samples in each caries status group, showing no significant differences (P = 0.98, Kruskal-Wallis). (D) The α diversity of all samples plotted over age, revealing no significant relationship (P = 0.18, Spearman correlation). Samples are colored by subject group: those who developed caries during the study period (caries converts) and those who did not (healthy controls).
Figure 3.
Figure 3.
Differentially abundant microbes based on caries status. (A) Enrollment visit samples were tested for differential abundance of microbes between subjects who would go on to develop caries during the study period and those who would not using the Wilcoxon rank-sum test. After adjusting for multiple testing of all microbes, no significant differences were observed. Bacteria represented here (rows) are differentially abundant with an unadjusted P value of ≤0.05 and may indicate weak trends without being statistically robust on their own. The first column shows centered log ratio normalized abundances between groups. The second column shows the unadjusted significance of the difference between groups. The third and fourth columns shows the fold change between groups and the proportion of samples within each group that a microbe was detected in, respectively. The fifth column shows the area under the receiver operating characteristic curve (AU-ROC) associated with given bacteria as a nonparametric measure of enrichment. (B) Microbial differential abundance between all 3 caries status groups across all visits was tested using a multivariate mixed-effects model to control for age, sequencing reads, and repeated sampling of subjects. The normalized relative abundances of microbes differing significantly after multiple test correction are plotted on the y-axes and grouped by caries status on the x-axes. Boxplots are centered on the median, with notches indicating an approximately 95% confidence interval, boxes indicating the boundaries of the first and third quartiles, and whiskers extending to the largest and smallest values no further than 1.5 * (interquartile range) from the boxes. Points beyond the whiskers are outliers. If the notches of 2 boxes within the same panel do not overlap on the y-axis, there is strong evidence that the true medians differ. The black diamond indicates the mean of each group.
Figure 4.
Figure 4.
Caries risk classifier performance. Two types of machine learning classifiers, random forest models (A, C) and gradient tree boosting models (B, D), were trained and tested using nested cross-validation to predict caries risk based on saliva microbiota composition at enrollment (A, B) and at the visit immediately preceding caries development (or a corresponding caries-free visit in healthy control subjects) when caries risk is most urgent (C, D). Receiver operating characteristic (ROC) curves showing the sensitivity and selectivity of the models on held-out testing samples at various discrimination thresholds are plotted with 95% confidence intervals. The total area under the curve (AUC) is given as a metric of each model’s predictive ability, and a P value is provided addressing the null hypothesis that the AUC is ≤0.5 (i.e., the model’s predictive ability is no better than random chance). All models are able to discriminate between children who will develop caries within the following 6 to 24 mo and those who will not significantly better than chance, and both types of models are able to predict caries risk most accurately when children are within 6 mo of caries development.
Figure 5.
Figure 5.
Important microbes in caries risk prediction models. For each time point (enrollment and immediately prior to caries development) and each type of model (random forest and gradient tree boosting), feature importance was averaged across all cross-validation folds and normalized to relative importance. For each model–time point combination, features were ranked by importance from greatest to least, and the top features with a cumulative relative importance of 50% were selected for plotting. The first column indicates which model or models a given microbe (row) was important in based on the selection criteria described: random forest at enrollment (RF1), random forest at urgent risk (RF2), gradient boosted at enrollment (GB1), and gradient boosted urgent risk (GB2). The second column indicates the most specific available taxonomic classification for the microbe. The following columns provide metrics of enrichment between pre-caries and healthy control groups across all time points. The final column gives the area under the receiver operating curve based on the rank abundance of the microbe and is independent of the machine learning models.

Similar articles

Cited by

References

    1. Allen DM. 1974. The relationship between variable selection and data augmentation and a method for prediction. Technometrics. 16(1):125–127.
    1. Baker JL, Faustoferri RC, Quivey RG., Jr. 2017. Acid-adaptive mechanisms of streptococcus mutans-the more we know, the more we don’t. Mol Oral Microbiol. 32(2):107–117. - PMC - PubMed
    1. Bowen WH, Burne RA, Wu H, Koo H. 2018. Oral biofilms: pathogens, matrix, and polymicrobial interactions in microenvironments. Trends Microbiol. 26(3):229–242. - PMC - PubMed
    1. Burne RA. 2018. Getting to know “the known unknowns”: heterogeneity in the oral microbiome. Adv Dent Res. 29(1):66–70. - PMC - PubMed
    1. Casamassimo PS, Thikkurissy S, Edelstein BL, Maiorini E. 2009. Beyond the dmft: the human and economic cost of early childhood caries. J Am Dent Assoc. 140(6):650–657. - PubMed

Publication types

Substances

Supplementary concepts