. 2024 Jan 16;5(1):101350.

doi: 10.1016/j.xcrm.2023.101350. Epub 2023 Dec 21.

Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

Jonathan L Golob¹, Tomiko T Oskotsky², Alice S Tang³, Alennie Roldan³, Verena Chung⁴, Connie W Y Ha⁵, Ronald J Wong⁶, Kaitlin J Flynn⁴, Antonio Parraga-Leo⁷, Camilla Wibrand⁸, Samuel S Minot⁹, Boris Oskotsky¹⁰, Gaia Andreoletti³, Idit Kosti³, Julie Bletz⁴, Amber Nelson⁴, Jifan Gao¹¹, Zhoujingpeng Wei¹¹, Guanhua Chen¹¹, Zheng-Zheng Tang¹¹, Pierfrancesco Novielli¹², Donato Romano¹², Ester Pantaleo¹³, Nicola Amoroso¹⁴, Alfonso Monaco¹³, Mirco Vacca¹⁵, Maria De Angelis¹⁵, Roberto Bellotti¹³, Sabina Tangaro¹², Abigail Kuntzleman¹⁶, Isaac Bigcraft¹⁶, Stephen Techtmann¹⁶, Daehun Bae¹⁷, Eunyoung Kim¹⁷, Jongbum Jeon¹⁸, Soobok Joe¹⁸; Preterm Birth DREAM Community; Kevin R Theis¹⁹, Sherrianne Ng²⁰, Yun S Lee²⁰, Patricia Diaz-Gimeno²¹, Phillip R Bennett²⁰, David A MacIntyre²⁰, Gustavo Stolovitzky²², Susan V Lynch²³, Jake Albrecht⁴, Nardhy Gomez-Lopez²⁴, Roberto Romero²⁵, David K Stevenson²⁶, Nima Aghaeepour²⁷, Adi L Tarca²⁸, James C Costello²⁹, Marina Sirota³⁰

Affiliations

¹ Division of Infectious Disease, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA. Electronic address: golobj@umich.edu.
² March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA. Electronic address: tomiko.oskotsky@ucsf.edu.
³ March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
⁴ Sage Bionetworks, Seattle, WA, USA.
⁵ Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.
⁶ Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; March of Dimes Prematurity Research Center at Stanford University, Stanford, CA, USA.
⁷ Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, Obstetrics and Gynaecology, Universidad de Valencia, Valencia, Spain; IVIRMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Valencia, Spain.
⁸ Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
⁹ Data Core, Shared Resources, Fred Hutchinson Cancer Center, Seattle, WA, USA.
¹⁰ Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA.
¹¹ Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
¹² Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy.
¹³ Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento Interateneo di Fisica "M, Merlin", Università degli Studi di Bari Aldo Moro, Bari, Italy.
¹⁴ Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy.
¹⁵ Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy.
¹⁶ Department of Biological Sciences, Michigan Technological University, Houghton, MI, USA.
¹⁷ School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea.
¹⁸ Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea.
¹⁹ Department of Biochemistry, Microbiology and Immunology, Wayne State University, Detroit, MI, USA.
²⁰ Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK.
²¹ IVIRMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Valencia, Spain.
²² Center for Computational Biology and Bioinformatics, Columbia University, New York, NY, USA; Thomas J. Watson Research Center, IBM, Yorktown Heights, NY, USA; Sema4, Stamford, CT, USA.
²³ Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA; Division of Gastroenterology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.
²⁴ Department of Biochemistry, Microbiology and Immunology, Wayne State University, Detroit, MI, USA; Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA.
²⁵ Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA; Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI, USA; Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA; Detroit Medical Center, Detroit, MI, USA; Department of Obstetrics and Gynecology, Florida International University, Miami, FL, USA.
²⁶ Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; Center for Academic Medicine, Stanford University School of Medicine, Stanford, CA, USA.
²⁷ Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA; Department of Biomedical Data Sciences, Stanford University School of Medicine, Stanford, CA, USA.
²⁸ Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI, USA; Department of Computer Science, Wayne State University College of Engineering, Detroit, MI, USA.
²⁹ Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
³⁰ March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA. Electronic address: marina.sirota@ucsf.edu.

PMID: 38134931
PMCID: PMC10829755
DOI: 10.1016/j.xcrm.2023.101350

Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

Jonathan L Golob et al. Cell Rep Med. 2024.

. 2024 Jan 16;5(1):101350.

doi: 10.1016/j.xcrm.2023.101350. Epub 2023 Dec 21.

Authors

Affiliations

¹ Division of Infectious Disease, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA. Electronic address: golobj@umich.edu.
² March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA. Electronic address: tomiko.oskotsky@ucsf.edu.
³ March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
⁴ Sage Bionetworks, Seattle, WA, USA.
⁵ Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.
⁶ Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; March of Dimes Prematurity Research Center at Stanford University, Stanford, CA, USA.
⁷ Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, Obstetrics and Gynaecology, Universidad de Valencia, Valencia, Spain; IVIRMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Valencia, Spain.
⁸ Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
⁹ Data Core, Shared Resources, Fred Hutchinson Cancer Center, Seattle, WA, USA.
¹⁰ Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA.
¹¹ Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
¹² Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy.
¹³ Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento Interateneo di Fisica "M, Merlin", Università degli Studi di Bari Aldo Moro, Bari, Italy.
¹⁴ Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy.
¹⁵ Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy.
¹⁶ Department of Biological Sciences, Michigan Technological University, Houghton, MI, USA.
¹⁷ School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea.
¹⁸ Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea.
¹⁹ Department of Biochemistry, Microbiology and Immunology, Wayne State University, Detroit, MI, USA.
²⁰ Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK.
²¹ IVIRMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Valencia, Spain.
²² Center for Computational Biology and Bioinformatics, Columbia University, New York, NY, USA; Thomas J. Watson Research Center, IBM, Yorktown Heights, NY, USA; Sema4, Stamford, CT, USA.
²³ Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA; Division of Gastroenterology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.
²⁴ Department of Biochemistry, Microbiology and Immunology, Wayne State University, Detroit, MI, USA; Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA.
²⁵ Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA; Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI, USA; Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA; Detroit Medical Center, Detroit, MI, USA; Department of Obstetrics and Gynecology, Florida International University, Miami, FL, USA.
²⁶ Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; Center for Academic Medicine, Stanford University School of Medicine, Stanford, CA, USA.
²⁷ Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA; Department of Biomedical Data Sciences, Stanford University School of Medicine, Stanford, CA, USA.
²⁸ Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI, USA; Department of Computer Science, Wayne State University College of Engineering, Detroit, MI, USA.
²⁹ Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
³⁰ March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA. Electronic address: marina.sirota@ucsf.edu.

PMID: 38134931
PMCID: PMC10829755
DOI: 10.1016/j.xcrm.2023.101350

Abstract

Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth.

Keywords: 16S harmonization; DREAM challenge; crowdsourced; machine learning; microbiome; predictive modeling; preterm birth; vaginal microbiome.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests S.V.L. is a board member at, holds stock in, and consults for Siolta Therapeutics. She also consults for the Atria Academy of Science and Medicine and for Sanofi. J.C.C. is co-founder of PrecisionProfile and OncoRx Insights. N.Aghaeepour. is a member of the scientific advisory boards of January AI, Parallel Bio, Celine Therapeutics, and WellSim Biomedical Technologies and is a paid consultant for Mara BioSystems. J.G. and M.S. have filed a patent related to the phylotype generation process.

Figures

**Figure 1**
Study design and challenge overview and data harmonization (A) Left: depiction of the assembled training and test datasets, harmonization of the data, transformation into feature tables, and the outcomes posed to the participating teams. Right: two sub-challenges, the global locations of the participating teams, the number of participants per sub-challenge, assessment process, and analysis of the better-performing models. (B) Uniform manifold approximation and projection (UMAP) ordination plots of the aggregated data before (left) and after (right) harmonization where each dot represents one vaginal microbiome sample colored by study. (C) Violin plots of Shannon alpha diversity by trimester before (top) and after (bottom) harmonization stratified by study.

**Figure 2**
Data visualization of microbiome features by outcome (A) UMAP ordination plots of the vaginal microbiome colored by outcome. (B) Violin plot of diversity before (left) and after (right) harmonization stratified and colored by outcome. (C) Alluvial plot of community state type (CST) frequencies across time stratified by birth outcome.

**Figure 3**
Prediction accuracy of models against sequestered validation data from two independent studies not available to modeling teams Bootstrapped area under the receiver operator characteristic (AUROC) curves and Bayes factors for (A) sub-challenge 1 and (B) sub-challenge 2 of the best-performing model of each team for each sub-challenge and the organizer’s baseline model (purple) against bootstrapped data (n = 1,000) with replacement from the two validation studies harmonized post hoc into the same feature sets. Bootstrapping was done by pregnancy, not specimen. Left column: box-and-whisker plots of the bootstrapped AUROC values; middle column: the Bayes factors when compared to the top-performing model; right column: Bayes factors when comparing against the organizer’s model. Yellow represents the two best-performing models for each sub-challenge. Blue represents models with a Bayes factor ≤20 when compared to the top-performing model.

**Figure 4**
Feature sets and individual compositional features used by top-performing models Top-performing models here are defined a bootstrapped area under receiver operator curve greater than 0.64 or 0.8, respectively, for sub-challenge 1 or 2, further limited to models that could make a prediction in less than 10 s on a twelve-core AMD Ryzen 3900X processor. (A) Feature tables used by the top-performing models for sub-challenge 1 (left) and sub-challenge 2 (right) to make their predictions of preterm birth and early preterm birth, respectively. Filled in blocks indicate that this feature table (by row) was used by a given model (columns) to make the prediction. Unfilled blocks are for feature tables that, when randomized, did not affect the prediction. (B) For the six sub-challenge 2 models evaluated by feature permutation that also made use of phylotypes at 0.1 distance, 32 of the phylotypes were used by all 6 models and 73 were used by 5 of the six models (right Venn diagram). 32 phylotypes used by all six models are grouped by the closest species (left) for that phylotype.

**Figure 5**
Ensemble model results For (A) sub-challenge 1 and (B) sub-challenge 2, the AUROC (left) curve and area under the precision-recall curve (AUPRC; right) of three ensemble models (“ensemble_top2”: top two best-performing models, “ensemble_top2”: models with Bayes factor less than 20, and “ensemble_all”: all models), as well as first place, second place, and baseline models, colored by model.

See this image and copyright information in PMC

Update of

Microbiome Preterm Birth DREAM Challenge: Crowdsourcing Machine Learning Approaches to Advance Preterm Birth Research.
Golob JL, Oskotsky TT, Tang AS, Roldan A, Chung V, Ha CWY, Wong RJ, Flynn KJ, Parraga-Leo A, Wibrand C, Minot SS, Andreoletti G, Kosti I, Bletz J, Nelson A, Gao J, Wei Z, Chen G, Tang ZZ, Novielli P, Romano D, Pantaleo E, Amoroso N, Monaco A, Vacca M, De Angelis M, Bellotti R, Tangaro S, Kuntzleman A, Bigcraft I, Techtmann S, Bae D, Kim E, Jeon J, Joe S; Preterm Birth DREAM Community; Theis KR, Ng S, Lee Li YS, Diaz-Gimeno P, Bennett PR, MacIntyre DA, Stolovitzky G, Lynch SV, Albrecht J, Gomez-Lopez N, Romero R, Stevenson DK, Aghaeepour N, Tarca AL, Costello JC, Sirota M. Golob JL, et al. medRxiv [Preprint]. 2023 Apr 11:2023.03.07.23286920. doi: 10.1101/2023.03.07.23286920. medRxiv. 2023. Update in: Cell Rep Med. 2024 Jan 16;5(1):101350. doi: 10.1016/j.xcrm.2023.101350. PMID: 36945505 Free PMC article. Updated. Preprint.

References

1. Blencowe H., Cousens S., Oestergaard M.Z., Chou D., Moller A.-B., Narwal R., Adler A., Vera Garcia C., Rohde S., Say L., Lawn J.E. National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications. Lancet. 2012;379:2162–2172. - PubMed
1. Blencowe H., Cousens S., Chou D., Oestergaard M., Say L., Moller A.-B., Kinney M., Lawn J. Born Too Soon: The global epidemiology of 15 million preterm births. Reprod. Health. 2013;10:S2. - PMC - PubMed
1. Liu L., Johnson H.L., Cousens S., Perin J., Scott S., Lawn J.E., Rudan I., Campbell H., Cibulskis R., Li M., et al. Global, regional, and national causes of child mortality: an updated systematic analysis for 2010 with time trends since 2000. Lancet. 2012;379:2151–2161. - PubMed
1. Norwitz E.R., Caughey A.B. Progesterone Supplementation and the Prevention of Preterm Birth. Rev. Obstet. Gynecol. 2011;4:60–72. - PMC - PubMed
1. Lynch A.M., Hart J.E., Agwu O.C., Fisher B.M., West N.A., Gibbs R.S. Association of extremes of prepregnancy BMI with the clinical presentations of preterm birth. Am. J. Obstet. Gynecol. 2014;210:428.e1–428.e9. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- ImmPort - Shared Data - Datasets

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

Affiliations

Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources