Predicting preterm birth through vaginal microbiota, cervical length, and WBC using a machine learning model

doi:10.3389/fmicb.2022.912853

. 2022 Aug 2:13:912853.

doi: 10.3389/fmicb.2022.912853. eCollection 2022.

Predicting preterm birth through vaginal microbiota, cervical length, and WBC using a machine learning model

Affiliations

¹ Department of Obstetrics and Gynecology, College of Medicine, Ewha Medical Research Institute, Ewha Womans University, Seoul, South Korea.
² Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.
³ Department of Obstetrics and Gynecology, College of Medicine, Yonsei University, Seoul, South Korea.
⁴ Department of Statistics, Seoul National University, Seoul, South Korea.

PMID: 35983325
PMCID: PMC9378785
DOI: 10.3389/fmicb.2022.912853

Predicting preterm birth through vaginal microbiota, cervical length, and WBC using a machine learning model

Sunwha Park et al. Front Microbiol. 2022.

. 2022 Aug 2:13:912853.

doi: 10.3389/fmicb.2022.912853. eCollection 2022.

Authors

Affiliations

¹ Department of Obstetrics and Gynecology, College of Medicine, Ewha Medical Research Institute, Ewha Womans University, Seoul, South Korea.
² Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.
³ Department of Obstetrics and Gynecology, College of Medicine, Yonsei University, Seoul, South Korea.
⁴ Department of Statistics, Seoul National University, Seoul, South Korea.

PMID: 35983325
PMCID: PMC9378785
DOI: 10.3389/fmicb.2022.912853

Abstract

An association between the vaginal microbiome and preterm birth has been reported. However, in practice, it is difficult to predict premature birth using the microbiome because the vaginal microbial community varies highly among samples depending on the individual, and the prediction rate is very low. The purpose of this study was to select markers that improve predictive power through machine learning among various vaginal microbiota and develop a prediction algorithm with better predictive power that combines clinical information. As a multicenter case-control study with 150 Korean pregnant women with 54 preterm delivery group and 96 full-term delivery group, cervicovaginal fluid was collected from pregnant women during mid-pregnancy. Their demographic profiles (age, BMI, education level, and PTB history), white blood cell count, and cervical length were recorded, and the microbiome profiles of the cervicovaginal fluid were analyzed. The subjects were randomly divided into a training (n = 101) and a test set (n = 49) in a two-to-one ratio. When training ML models using selected markers, five-fold cross-validation was performed on the training set. A univariate analysis was performed to select markers using seven statistical tests, including the Wilcoxon rank-sum test. Using the selected markers, including Lactobacillus spp., Gardnerella vaginalis, Ureaplasma parvum, Atopobium vaginae, Prevotella timonensis, and Peptoniphilus grossensis, machine learning models (logistic regression, random forest, extreme gradient boosting, support vector machine, and GUIDE) were used to build prediction models. The test area under the curve of the logistic regression model was 0.72 when it was trained with the 17 selected markers. When analyzed by combining white blood cell count and cervical length with the seven vaginal microbiome markers, the random forest model showed the highest test area under the curve of 0.84. The GUIDE, the single tree model, provided a more reasonable biological interpretation, using the 10 selected markers (A. vaginae, G. vaginalis, Lactobacillus crispatus, Lactobacillus fornicalis, Lactobacillus gasseri, Lactobacillus iners, Lactobacillus jensenii, Peptoniphilus grossensis, P. timonensis, and U. parvum), and the covariates produced a tree with a test area under the curve of 0.77. It was confirmed that the association with preterm birth increased when P. timonensis and U. parvum increased (AUC = 0.77), which could also be explained by the fact that as the number of Peptoniphilus lacrimalis increased, the association with preterm birth was high (AUC = 0.77). Our study demonstrates that several candidate bacteria could be used as potential predictors for preterm birth, and that the predictive rate can be increased through a machine learning model employing a combination of cervical length and white blood cell count information.

Keywords: 16s ribosomal RNA metagenome sequencing; cervicovaginal fluid; machine learning; microbial-marker; pregnancy; preterm birth; vaginal microbiome.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Flowchart of the study. CVF, cervicovaginal fluid; rRNA, ribosomal ribonucleic acid; OTUs, operational taxonomic units.

**Figure 2**
Flowchart of marker selection and evaluation in exhaustive search. The data were split to a training set and test set in a two-to-one ratio. Markers with frequency more than 25% and mean proportion more than 0.001% were selected. Then, markers, showing significant p values in two or more statistical tests, were selected. Venn diagram of significant markers (p < 0.05) after seven statistical methods (ZIG, ZIBSeq, ANCOM, CLR permutation, Wilcoxon rank-sum test, DESeq2, and edgeR) is shown. Additional filtering steps were applied to the selected markers to finalize the set of 10 and 17 markers. For the given set of markers, exhaustive search was applied to every possible combination of markers using LR. Best marker sets for each number of combinations were selected using AUC from the training set. The global best marker set among these selected sets was chosen as the one that showed the highest AUC from the five-fold CV. Then, the final marker set was select based on the test set. Lastly, the final marker sets were used in building machine learning (ML) models.

**Figure 3**
Differences in alpha- and beta-diversity between PTB and TB groups. **(A)** Shannon’s alpha diversity was significantly higher in the PTB group (PTB, n = 54; TB, n = 96). **(B)** Multidimensional scaling plot. Boxes show median and interquartile ranges, and whiskers extend from minimum to maximum values.

**Figure 4**
ROC curve and feature importance plot of the Random Forest (RF) models using covariates and selected markers. **(A)** RF model’s ROC curve on test data using 10 selected markers and WBC **(B)** RF model’s feature importance plot using 10 selected markers and WBC. **(C)** RF model’s ROC curve on a test using forward-selected markers, WBC and cervical length. **(D)** RF model’s Feature Importance plot using forward-selected markers, WBC and cervical length.

**Figure 5**
Decision trees made from GUIDE algorithm using covariates, cervix length and WBC with **(A)** ten pre-selected markers and **(B)** seven markers forward selected from the total markers. GUIDE v.38.0 classification tree for predicting Y using estimated priors and unit misclassification costs. Tree constructed with 109 observations. Pruning parameter α was 0.02 for A and 0.03 for B. At each split, an observation goes to the left branch if and only if the condition is satisfied. Predicted classes and sample sizes printed below terminal nodes; class sample proportion for Y = Preterm beside nodes. In **(A)**, V1 stands for *Prevotella timonensis*. In **(B)** V1 stands for *Peptoniphilus lacrimalis.*

See this image and copyright information in PMC

Cited by

Maternal and infant microbiome: next-generation indicators and targets for intergenerational health and nutrition care.
Gao S, Wang J. Gao S, et al. Protein Cell. 2023 Nov 8;14(11):807-823. doi: 10.1093/procel/pwad029. Protein Cell. 2023. PMID: 37184065 Free PMC article. Review.
DeepMPTB: a vaginal microbiome-based deep neural network as artificial intelligence strategy for efficient preterm birth prediction.
Chakoory O, Barra V, Rochette E, Blanchon L, Sapin V, Merlin E, Pons M, Gallot D, Comtet-Marre S, Peyret P. Chakoory O, et al. Biomark Res. 2024 Feb 14;12(1):25. doi: 10.1186/s40364-024-00557-1. Biomark Res. 2024. PMID: 38355595 Free PMC article.
Effect of Particulate Matter 2.5 on Fetal Growth in Male and Preterm Infants through Oxidative Stress.
Park S, Kwon E, Lee G, You YA, Kim SM, Hur YM, Jung S, Jee Y, Park MH, Na SH, Kim YH, Cho GJ, Bae JG, Lee SJ, Lee SH, Kim YJ. Park S, et al. Antioxidants (Basel). 2023 Oct 26;12(11):1916. doi: 10.3390/antiox12111916. Antioxidants (Basel). 2023. PMID: 38001768 Free PMC article.
Strategies for Safeguarding High-Risk Pregnancies From Preterm Birth: A Narrative Review.
Al Hussaini HA, Almughathawi RK, Alsaedi RM, Aljateli GA, Alhejaili GSM, Aldossari MA, Almunyif AS, Almarshud RK. Al Hussaini HA, et al. Cureus. 2024 Mar 7;16(3):e55737. doi: 10.7759/cureus.55737. eCollection 2024 Mar. Cureus. 2024. PMID: 38586732 Free PMC article. Review.
Predicting Spontaneous Preterm Birth Using the Immunome.
Feyaerts D, Marić I, Arck PC, Prins JR, Gomez-Lopez N, Gaudillière B, Stelzer IA. Feyaerts D, et al. Clin Perinatol. 2024 Jun;51(2):441-459. doi: 10.1016/j.clp.2024.02.013. Epub 2024 Apr 4. Clin Perinatol. 2024. PMID: 38705651 Free PMC article. Review.

See all "Cited by" articles

References

1. Ananth C. V., Friedman A. M., Goldenberg R. L., Wright J. D., Vintzileos A. M. (2018). Association between temporal changes in neonatal mortality and spontaneous and clinician-initiated deliveries in the United States, 2006–2013. JAMA Pediatr. 172, 949–957. doi: 10.1001/jamapediatrics.2018.1792, PMID: - DOI - PMC - PubMed
1. Bennett P. R., Brown R. G., MacIntyre D. A. (2020). Vaginal microbiome in preterm rupture of membranes. Obstet. Gynecol. Clin. North Am. 47, 503–521. doi: 10.1016/j.ogc.2020.08.001, PMID: - DOI - PubMed
1. Bray J. R., Curtis J. T. (1957). An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27, 325–349. doi: 10.2307/1942268 - DOI
1. Callahan B. J., McMurdie P. J., Holmes S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639–2643. doi: 10.1038/ismej.2017.119, PMID: - DOI - PMC - PubMed
1. Callahan B. J., McMurdie P. J., Rosen M. J., Han A. W., Johnson A. J., Holmes S. P. (2016). DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583. doi: 10.1038/nmeth.3869, PMID: - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

[1] Ananth C. V., Friedman A. M., Goldenberg R. L., Wright J. D., Vintzileos A. M. (2018). Association between temporal changes in neonatal mortality and spontaneous and clinician-initiated deliveries in the United States, 2006–2013. JAMA Pediatr. 172, 949–957. doi: 10.1001/jamapediatrics.2018.1792, PMID: - DOI - PMC - PubMed

[2] Ananth C. V., Friedman A. M., Goldenberg R. L., Wright J. D., Vintzileos A. M. (2018). Association between temporal changes in neonatal mortality and spontaneous and clinician-initiated deliveries in the United States, 2006–2013. JAMA Pediatr. 172, 949–957. doi: 10.1001/jamapediatrics.2018.1792, PMID: - DOI - PMC - PubMed

[3] Bennett P. R., Brown R. G., MacIntyre D. A. (2020). Vaginal microbiome in preterm rupture of membranes. Obstet. Gynecol. Clin. North Am. 47, 503–521. doi: 10.1016/j.ogc.2020.08.001, PMID: - DOI - PubMed

[4] Bennett P. R., Brown R. G., MacIntyre D. A. (2020). Vaginal microbiome in preterm rupture of membranes. Obstet. Gynecol. Clin. North Am. 47, 503–521. doi: 10.1016/j.ogc.2020.08.001, PMID: - DOI - PubMed

[5] Bray J. R., Curtis J. T. (1957). An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27, 325–349. doi: 10.2307/1942268 - DOI

[6] Bray J. R., Curtis J. T. (1957). An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27, 325–349. doi: 10.2307/1942268 - DOI

[7] Callahan B. J., McMurdie P. J., Holmes S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639–2643. doi: 10.1038/ismej.2017.119, PMID: - DOI - PMC - PubMed

[8] Callahan B. J., McMurdie P. J., Holmes S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639–2643. doi: 10.1038/ismej.2017.119, PMID: - DOI - PMC - PubMed

[9] Callahan B. J., McMurdie P. J., Rosen M. J., Han A. W., Johnson A. J., Holmes S. P. (2016). DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583. doi: 10.1038/nmeth.3869, PMID: - DOI - PMC - PubMed

[10] Callahan B. J., McMurdie P. J., Rosen M. J., Han A. W., Johnson A. J., Holmes S. P. (2016). DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583. doi: 10.1038/nmeth.3869, PMID: - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting preterm birth through vaginal microbiota, cervical length, and WBC using a machine learning model

Affiliations

Predicting preterm birth through vaginal microbiota, cervical length, and WBC using a machine learning model

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous