Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy
- PMID: 34808532
- PMCID: PMC8674730
- DOI: 10.1016/j.cmpb.2021.106520
Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy
Abstract
Background: Clinical models to predict first trimester viability are traditionally based on multivariable logistic regression (LR) which is not directly interpretable for non-statistical experts like physicians. Furthermore, LR requires complete datasets and pre-established variables specifications. In this study, we leveraged the internal non-linearity, feature selection and missing values handling mechanisms of machine learning algorithms, along with a post-hoc interpretability strategy, as potential advantages over LR for clinical modeling.
Methods: The dataset included 1154 patients with 2377 individual scans and was obtained from a prospective observational cohort study conducted at a hospital in London, UK, from March 2014 to May 2019. The data were split into a training (70%) and a test set (30%). Parsimonious and complete multivariable models were developed from two algorithms to predict first trimester viability at 11-14 weeks gestational age (GA): LR and light gradient boosted machine (LGBM). Missing values were handled by multiple imputation where appropriate. The SHapley Additive exPlanations (SHAP) framework was applied to derive individual explanations of the models.
Results: The parsimonious LGBM model had similar discriminative and calibration performance as the parsimonious LR (AUC 0.885 vs 0.860; calibration slope: 1.19 vs 1.18). The complete models did not outperform the parsimonious models. LGBM was robust to the presence of missing values and did not require multiple imputation unlike LR. Decision path plots and feature importance analysis revealed different algorithm behaviors despite similar predictive performance. The main driving variable from the LR model was the pre-specified interaction between fetal heart presence and mean sac diameter. The crown-rump length variable and a proxy variable reflecting the difference in GA between expected and observed GA were the two most important variables of LGBM. Finally, while variable interactions must be specified upfront with LR, several interactions were ranked by the SHAP framework among the most important features learned automatically by the LGBM algorithm.
Conclusions: Gradient boosted algorithms performed similarly to carefully crafted LR models in terms of discrimination and calibration for first trimester viability prediction. By handling multi-collinearity, missing values, feature selection and variable interactions internally, the gradient boosted trees algorithm, combined with SHAP, offers a serious alternative to traditional LR models.
Keywords: First trimester viability; Gradient boosted tree; Logistic regression; Machine learning; Post-hoc interpretability; Shapley value.
Copyright © 2021 The Authors. Published by Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest The authors declare that no conflict of interest exists
Figures






Similar articles
-
Predictive etiological classification of acute ischemic stroke through interpretable machine learning algorithms: a multicenter, prospective cohort study.BMC Med Res Methodol. 2024 Sep 10;24(1):199. doi: 10.1186/s12874-024-02331-1. BMC Med Res Methodol. 2024. PMID: 39256656 Free PMC article.
-
Interpretable machine learning for allergic rhinitis prediction among preschool children in Urumqi, China.Sci Rep. 2024 Sep 27;14(1):22281. doi: 10.1038/s41598-024-73733-w. Sci Rep. 2024. PMID: 39333659 Free PMC article.
-
Predicting major adverse cardiac events in diabetes and chronic kidney disease: a machine learning study from the Silesia Diabetes-Heart Project.Cardiovasc Diabetol. 2025 Feb 15;24(1):76. doi: 10.1186/s12933-025-02615-w. Cardiovasc Diabetol. 2025. PMID: 39955553 Free PMC article.
-
Interpretable machine learning model to predict surgical difficulty in laparoscopic resection for rectal cancer.Front Oncol. 2024 Feb 6;14:1337219. doi: 10.3389/fonc.2024.1337219. eCollection 2024. Front Oncol. 2024. PMID: 38380369 Free PMC article. Review.
-
Comparison of Multivariable Logistic Regression and Other Machine Learning Algorithms for Prognostic Prediction Studies in Pregnancy Care: Systematic Review and Meta-Analysis.JMIR Med Inform. 2020 Nov 17;8(11):e16503. doi: 10.2196/16503. JMIR Med Inform. 2020. PMID: 33200995 Free PMC article. Review.
Cited by
-
Risk prediction model based on machine learning for predicting miscarriage among pregnant patients with immune abnormalities.Front Pharmacol. 2024 Apr 22;15:1366529. doi: 10.3389/fphar.2024.1366529. eCollection 2024. Front Pharmacol. 2024. PMID: 38711993 Free PMC article.
-
Predicting Postoperative Anterior Chamber Angle for Phakic Intraocular Lens Implantation Using Preoperative Anterior Segment Metrics.Transl Vis Sci Technol. 2023 Jan 3;12(1):10. doi: 10.1167/tvst.12.1.10. Transl Vis Sci Technol. 2023. PMID: 36607625 Free PMC article.
-
A data-driven framework for fair and efficient organ transplantation using gradient boosting and adaptive genetic allocation.J Artif Organs. 2025 Jun 6. doi: 10.1007/s10047-025-01512-z. Online ahead of print. J Artif Organs. 2025. PMID: 40478423
-
A review of evaluation approaches for explainable AI with applications in cardiology.Artif Intell Rev. 2024;57(9):240. doi: 10.1007/s10462-024-10852-w. Epub 2024 Aug 9. Artif Intell Rev. 2024. PMID: 39132011 Free PMC article.
-
Predicting Leadership Status Through Trait Emotional Intelligence and Cognitive Ability.Behav Sci (Basel). 2025 Mar 11;15(3):345. doi: 10.3390/bs15030345. Behav Sci (Basel). 2025. PMID: 40150239 Free PMC article.
References
-
- Farren J., Jalmbrant M., Falconieri N., Mitchell-Jones N., Bobdiwala S., Al-Memar M., et al. Posttraumatic stress, anxiety and depression following miscarriage and ectopic pregnancy: a multicenter, prospective, cohort study. Am. J. Obstet. Gynecol. 2020;222:367e1–367e22. doi: 10.1016/j.ajog.2019.10.102. - DOI - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources