Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 4:18:4195-4207.
doi: 10.2147/IJGM.S521763. eCollection 2025.

A Multi-Algorithm Machine Learning Model for Predicting the Risk of Preterm Birth in Patients with Early-Onset Preeclampsia

Affiliations

A Multi-Algorithm Machine Learning Model for Predicting the Risk of Preterm Birth in Patients with Early-Onset Preeclampsia

Yanhong Xu et al. Int J Gen Med. .

Abstract

Purpose: To analyze the risk factors for preterm birth in patients with early-onset preeclampsia (EOPE) based on multi-algorithm machine learning and to construct a predictive model to explore the predictive value of the model.

Methods: A retrospective analysis was conducted on 442 EOPE patients from a single tertiary center, divided into preterm birth (<37 weeks, n=358) and term-born (≥37 weeks, n=84) groups. Univariate analysis, random forest importance assessment, lasso regression combined with multivariate regression analysis were used for feature evaluation. Eight machine learning models were trained (70% data) and validated (30% data). A Stacking ensemble model was constructed, and SHapley Additive exPlanations (SHAP) was used for feature interpretation.

Results: The area under the receiver operating characteristic curve (AUROC) for predicting preterm birth in EOPE patients using Logistic Regression, Gaussian Naive Bayes, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine, Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), Multi-Layer Perceptron, and Elastic Net were 0.763, 0.712, 0.821, 0.832, 0.821, 0.842, 0.784, and 0.763, respectively. The Stacking model (XGBoost+GBDT+SVM) achieved superior performance (AUROC=0.865). Three independent risk factors were identified: fetal growth restriction (aOR=3.50, p = 0.047), serum cystatin C (aOR=11.27, p = 0.018), and C-reactive protein (aOR=1.37, p < 0.001). SHAP analysis revealed GBDT as the top contributor to Stacking predictions, with microalbunminuria (GBDT, XGBoost) and age (SVM) being the most influential features.

Conclusion: Machine learning models can serve as reliable assessment tools for predicting the risk of preterm birth in patients with EOPE. The ensemble prediction model demonstrates the best predictive performance, helping obstetricians identify high-risk patients and perform early intervention to improve perinatal outcomes.

Keywords: clinical prediction model; early-onset preeclampsia; machine learning; preterm birth.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest in this work.

Figures

Figure 1
Figure 1
Variable selection for the prediction model. (A) Importance ranking of feature variables in the random forest model for preterm birth risk in patients with early-onset preeclampsia. (B) Variables selected by Lasso regression, with the dashed line representing the lambda that yields the minimum mean of the target parameters, under which the model achieved the best performance. (C) Variation characteristics of LASSO regression coefficients. (D) Forest plot displaying adjusted odds ratios with 95% confidence intervals for independent risk factors associated with preterm birth in preeclampsia patients. *P<0.05.
Figure 2
Figure 2
Performance comparison of different machine learning models. (A) ROC curves of 9 models on the training set. (B) ROC curves of 9 models on the validation set. (C) PRC curves of 9 models on the training set. (D) PRC curves of 9 models on the validation set. (E) Heatmap of evaluation metrics for 9 models on the training set. (F) Heatmap of evaluation metrics for 9 models on the validation set.
Figure 3
Figure 3
SHAP value of different models on the selected feature set. (A) SHAP value of the best Stacking model. (B) SHAP value of GBDT on selected feature set. (C) SHAP value of SVM on selected feature set. (D) SHAP value of XGBoost on selected feature set.

Similar articles

References

    1. Docheva N, Arenas G, Nieman KM, Lopes-Perdigao J, Yeo K-TJ, Rana S. Angiogenic biomarkers for risk stratification in women with preeclampsia. Clin Chem. 2022;68(6):771–781. doi: 10.1093/clinchem/hvab281 - DOI - PubMed
    1. Alvestad S, Husebye ESN, Christensen J, et al. Folic acid and risk of preterm birth, preeclampsia, and fetal growth restriction among women with epilepsy: a prospective cohort study. Neurology. 2022;99(6):e605–e615. doi: 10.1212/WNL.0000000000200669 - DOI - PMC - PubMed
    1. Li X, Zhang W, Lin J, et al. Preterm birth, low birthweight, and small for gestational age among women with preeclampsia: does maternal age matter? Pregnancy Hypertens. 2018;13:260–266. doi: 10.1016/j.preghy.2018.07.004 - DOI - PubMed
    1. An H, Jin M, Li Z, et al. Impact of gestational hypertension and pre-eclampsia on preterm birth in China: a large prospective cohort study. BMJ Open. 2022;12(9):e058068. doi: 10.1136/bmjopen-2021-058068 - DOI - PMC - PubMed
    1. Jung E, Romero R, Yeo L, et al. The etiology of preeclampsia. Am J Obstet Gynecol. 2022;226(2S):S844–S866. doi: 10.1016/j.ajog.2021.11.1356 - DOI - PMC - PubMed

LinkOut - more resources