Predicting preterm birth using auto-ML frameworks: a large observational study using electronic inpatient discharge data

Deming Kong^#¹, Ye Tao^#¹, Haiyan Xiao^#¹, Huini Xiong^#¹, Weizhong Wei¹, Miao Cai²

Affiliations

¹ Wuhan Children's Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
² Department of Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, Guangdong, China.

^# Contributed equally.

PMID: 38362001
PMCID: PMC10867966
DOI: 10.3389/fped.2024.1330420

Predicting preterm birth using auto-ML frameworks: a large observational study using electronic inpatient discharge data

Deming Kong et al. Front Pediatr. 2024.

. 2024 Jan 31:12:1330420.

doi: 10.3389/fped.2024.1330420. eCollection 2024.

Authors

Deming Kong^#¹, Ye Tao^#¹, Haiyan Xiao^#¹, Huini Xiong^#¹, Weizhong Wei¹, Miao Cai²

Affiliations

¹ Wuhan Children's Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
² Department of Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, Guangdong, China.

^# Contributed equally.

PMID: 38362001
PMCID: PMC10867966
DOI: 10.3389/fped.2024.1330420

Abstract

Background: To develop and compare different AutoML frameworks and machine learning models to predict premature birth.

Methods: The study used a large electronic medical record database to include 715,962 participants who had the principal diagnosis code of childbirth. Three Automatic Machine Learning (AutoML) were used to construct machine learning models including tree-based models, ensembled models, and deep neural networks on the training sample (N = 536,971). The area under the curve (AUC) and training times were used to assess the performance of the prediction models, and feature importance was computed via permutation-shuffling.

Results: The H2O AutoML framework had the highest median AUC of 0.846, followed by AutoGluon (median AUC: 0.840) and Auto-sklearn (median AUC: 0.820), and the median training time was the lowest for H2O AutoML (0.14 min), followed by AutoGluon (0.16 min) and Auto-sklearn (4.33 min). Among different types of machine learning models, the Gradient Boosting Machines (GBM) or Extreme Gradient Boosting (XGBoost), stacked ensemble, and random forrest models had better predictive performance, with median AUC scores being 0.846, 0.846, and 0.842, respectively. Important features related to preterm birth included premature rupture of membrane (PROM), incompetent cervix, occupation, and preeclampsia.

Conclusions: Our study highlights the potential of machine learning models in predicting the risk of preterm birth using readily available electronic medical record data, which have significant implications for improving prenatal care and outcomes.

Keywords: China; administrative data; autoML; machine learning; preterm birth.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Area under the curve (AUC) for different AutoML frameworks and machine learning models. (A) Raincloud plot of the area under the curve (AUC) for three AutoML frameworks (Auto-sklearn, AutoGluon, and H2O AutoML). Each raincloud plot panel consists of three components: a jittered dot plot on the left side, a boxplot in the middle, and a cloud plot of the distribution of AUCs on the right side. (B) Boxplots of AUCs by machine learning models. GBM: Gradient Boosting Machines; GLM: Generalized Linear Models; KNN: K-Nearest Neighbors; LDA: Linear Discriminant Analysis.

**Figure 2**
Training time in minutes for different AutoML frameworks and machine learning models. (A) Raincloud plot of training time in minutes (training set sample size N = 536,971) for three AutoML frameworks (Auto-sklearn, AutoGluon, and H2O AutoML). Each raincloud plot panel consists of three components: a jittered dot plot on the left side, a boxplot in the middle, and a cloud plot of the distribution of AUCs on the right side. (B) Boxplots of training time in minutes by machine learning models. GBM: Gradient Boosting Machines; GLM: Generalized Linear Models; KNN: K-Nearest Neighbors; LDA: Linear Discriminant Analysis.

**Figure 3**
Overall feature importance (95% confidence intervals) plots for predicting preterm birth via permutation-shuffling in AutoGluon. PROM: premature rupture of membranes.

See this image and copyright information in PMC

References

1. Cao G, Liu J, Liu M. Global, regional, and national incidence and mortality of neonatal preterm birth, 1990–2019. JAMA Pediatr. (2022) 176(8):787–96. 10.1001/jamapediatrics.2022.1622 - DOI - PMC - PubMed
1. Walani SR. Global burden of preterm birth. Int J Gynaecol Obstet. (2020) 150(1):31–3. 10.1002/ijgo.13195 - DOI - PubMed
1. Vogel JP, Chawanpaiboon S, Moller AB, Watananirun K, Bonet M, Lumbiganon P. The global epidemiology of preterm birth. Best Pract Res Clin Obstet Gynaecol. (2018) 52:3–12. 10.1016/j.bpobgyn.2018.04.003 - DOI - PubMed
1. Cai M, Lin X, Wang X, Zhang S, Wang C, Zhang Z, et al. Long-term exposure to ambient fine particulate matter chemical composition and in-hospital case fatality among patients with stroke in China. The Lancet Reg Health West Pac. (2023) 32:1–13. 10.1016/j.lanwpc.2022.100679 - DOI - PMC - PubMed
1. Cai M, Zhang S, Lin X, Qian Z, McMillin SE, Yang Y, et al. Association of ambient particulate matter pollution of different sizes with in-hospital case fatality among stroke patients in China. Neurology. (2022) 98:e2474–86. 10.1212/WNL.0000000000200546 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting preterm birth using auto-ML frameworks: a large observational study using electronic inpatient discharge data

Affiliations

Predicting preterm birth using auto-ML frameworks: a large observational study using electronic inpatient discharge data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources