. 2023 Jul 20:9:20552076231187605.

doi: 10.1177/20552076231187605. eCollection 2023 Jan-Dec.

Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study

Affiliations

¹ Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK.
² School of Computing Science, Northumbria University, Newcastle upon Tyne, UK.
³ Department of Cardiac Surgery, Rabindranath Tagore International Institute of Cardiac Sciences, Kolkata, India.

PMID: 37492033
PMCID: PMC10363892
DOI: 10.1177/20552076231187605

Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study

Tim Dong et al. Digit Health. 2023.

. 2023 Jul 20:9:20552076231187605.

doi: 10.1177/20552076231187605. eCollection 2023 Jan-Dec.

Authors

Affiliations

¹ Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK.
² School of Computing Science, Northumbria University, Newcastle upon Tyne, UK.
³ Department of Cardiac Surgery, Rabindranath Tagore International Institute of Cardiac Sciences, Kolkata, India.

PMID: 37492033
PMCID: PMC10363892
DOI: 10.1177/20552076231187605

Abstract

Objective: The introduction of new clinical risk scores (e.g. European System for Cardiac Operative Risk Evaluation (EuroSCORE) II) superseding original scores (e.g. EuroSCORE I) with different variable sets typically result in disparate datasets due to high levels of missingness for new score variables prior to time of adoption. Little is known about the use of ensemble learning to incorporate disparate data from legacy scores. We tested the hypothesised that Homogenenous and Heterogeneous Machine Learning (ML) ensembles will have better performance than ensembles of Dynamic Model Averaging (DMA) for combining knowledge from EuroSCORE I legacy data with EuroSCORE II data to predict cardiac surgery risk.

Methods: Using the National Adult Cardiac Surgery Audit dataset, we trained 12 different base learner models, based on two different variable sets from either EuroSCORE I (LogES) or EuroScore II (ES II), partitioned by the time of score adoption (1996-2016 or 2012-2016) and evaluated on holdout set (2017-2019). These base learner models were ensembled using nine different combinations of six ML algorithms to produce homogeneous or heterogeneous ensembles. Performance was assessed using a consensus metric.

Results: Xgboost homogenous ensemble (HE) was the highest performing model (clinical effectiveness metric (CEM) 0.725) with area under the curve (AUC) (0.8327; 95% confidence interval (CI) 0.8323-0.8329) followed by Random Forest HE (CEM 0.723; AUC 0.8325; 95%CI 0.8320-0.8326). Across different heterogenous ensembles, significantly better performance was obtained by combining siloed datasets across time (CEM 0.720) than building ensembles of either 1996-2011 (t-test adjusted, p = 1.67×10^-6) or 2012-2019 (t-test adjusted, p = 1.35×10^-193) datasets alone.

Conclusions: Both homogenous and heterogenous ML ensembles performed significantly better than DMA ensemble of Bayesian Update models. Time-dependent ensemble combination of variables, having differing qualities according to time of score adoption, enabled previously siloed data to be combined, leading to increased power, clinical interpretability of variables and usage of data.

Keywords: Cardiac surgery; dynamic model averaging; ensemble learning; legacy scores; machine learning; mortality; multi-modal data; risk prediction.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Design overview of the study; homogenous ensembles (logES-ESII-P) and heterogenous ensembles logES-O, ESII-O and logES-ESII-A were built and evaluated consensus metric. Further details of each model are provided in Supplemental materials. Mark Chain Monte-Carlo (MCMC) was used as the latent algorithm of Dynamic Model Averaging by Bayesian Update models; data were partitioned based on risk score adoption periods 1996–2016 (LogES) and 2012–2016 (ES II) and ensembled using the respective score variables; 2017–2019 data was used as hold-out data for evaluation.

**Figure 2.**
(a) LogES MCMC 2012–2016 kernel density plots showing distribution of coefficient estimate for 6 LogES coefficients; red dotted lines show original LogES values; coefficients updated based on coefficients estimated using 1996–2011 dataset as prior; three kernels for each coefficient represent the three chains of MCMC estimates; (b) ES II MCMC 2012–2016 kernel density plots showing distribution of coefficient estimate for 6 ES II coefficients; red dotted lines show original ES II values; three kernels for each coefficient represent the three chains of MCMC estimates; (c) Histogram of LogES values calculated for 2017–2019 dataset using coefficients estimated from 2012 to 2016, which was updated based on coefficients estimated using 1996–2011 dataset; red shows the estimated distribution; green shows distribution based on the original LogES coefficients; (d) Histogram of ES II values calculated for 2017–2019 dataset using coefficients estimated from 2012 to 2016; red shows the estimated distribution; green shows distribution based on the original ES II coefficients; (e) Forest plot of LogES MCMC estimated coefficients for each variable versus original LogES coefficients; MCMC coefficients were obtained using data from 2012 to 2016 and updated based on coefficients from 1996 to 2011; 95% CI are narrow and barely visible; (f) Forest plot of ES II MCMC estimated coefficients for each variable versus original ES II coefficients; MCMC coefficients were obtained using data from 2012 to 2016; 95% CI are narrow and barely visible.

**Figure 3.**
(a) Homogenous Ensembles: 5 LogES models are combined with the 5 ES II models using soft-voting for each corresponding ML model pair, for example, RF (LogES) + RF (ES II); the Bayesian Update Ensemble was built by using soft-voting to combine Bayesian updated LogES scores with Bayesian updated ES II scores; (b) multiple pairwise paired t-test for logES-O, ESII-O and logES-ESII-A; (c) ROC-AUC performances of logES-O, ESII-O and logES-ESII-A models; (d) logES-ESII-A results are compared against each of the logES-ESII-P models using multiple pairwise paired t-tests.

**Figure 4.**
(a) Tree SHAP feature importance plot for Holdout (n = 69,891; 2017–2019); every patient is represented as a dot; the x position of the dot is the impact of that feature on the model's prediction for that patient in log-odds; red: high variable values; blue: low variable values; patients that do not fit on the row pile up to show regions of high case volume; (b) mean absolute magnitude of importance across all prediction outputs; (c) log-odds of mortality (y-axis) versus normalised New York Heart Association (NYHA) Functional Classification values (x-axis); interactions of Operative Urgency are colored with red having higher normalised urgency; four vertical streaks from left to right show NYHA Classes: I, II, III, IV; (d) log-odds of mortality (y-axis) versus normalised renal impairment (x-axis); interactions with weight of intervention are colored with red having higher number of normalised procedures; four vertical streaks from left to right show renal impairment statuses: normal, moderate, on dialysis, severe.

See this image and copyright information in PMC

Cited by

The robot butler: How and why should we study predictive algorithms and artificial intelligence (AI) in healthcare?
Gjødsbøl IM, Ringgaard AK, Holm PC, Brunak S, Bundgaard H. Gjødsbøl IM, et al. Digit Health. 2024 Mar 24;10:20552076241241674. doi: 10.1177/20552076241241674. eCollection 2024 Jan-Dec. Digit Health. 2024. PMID: 38528969 Free PMC article.
A review of evaluation approaches for explainable AI with applications in cardiology.
Salih AM, Galazzo IB, Gkontra P, Rauseo E, Lee AM, Lekadir K, Radeva P, Petersen SE, Menegaz G. Salih AM, et al. Artif Intell Rev. 2024;57(9):240. doi: 10.1007/s10462-024-10852-w. Epub 2024 Aug 9. Artif Intell Rev. 2024. PMID: 39132011 Free PMC article.
Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis.
Dong T, Sinha S, Zhai B, Fudulu D, Chan J, Narayan P, Judge A, Caputo M, Dimagli A, Benedetto U, Angelini GD. Dong T, et al. JMIRx Med. 2024 Jun 12;5:e45973. doi: 10.2196/45973. JMIRx Med. 2024. PMID: 38889069 Free PMC article.
Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects.
Dong T, Oronti IB, Sinha S, Freitas A, Zhai B, Chan J, Fudulu DP, Caputo M, Angelini GD. Dong T, et al. Bioengineering (Basel). 2024 Oct 18;11(10):1039. doi: 10.3390/bioengineering11101039. Bioengineering (Basel). 2024. PMID: 39451414 Free PMC article.
Triglyceride index as a predictor of mortality after cardiac surgery.
Li H, Xiao F, Ren H, Xu F, Che H, Zhu H, Zhou C, Wang S. Li H, et al. iScience. 2024 Oct 5;27(11):111107. doi: 10.1016/j.isci.2024.111107. eCollection 2024 Nov 15. iScience. 2024. PMID: 39620137 Free PMC article.

See all "Cited by" articles

References

1. Nashef SAM, Roques F, Michel P, et al.European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999; 16: 9–13. - PubMed
1. Ad N, Holmes SD, Patel J, et al.Comparison of EuroSCORE II, original EuroSCORE, and the society of thoracic surgeons risk score in cardiac surgery patients. Ann Thorac Surg 2016; 102: 573–579. - PubMed
1. Roques F, Nashef SAM, Michel P, et al.Risk factors and outcome in European cardiac surgery: analysis of the EuroSCORE multinational database of 19030 patients. Eur J Cardiothorac Surg 1999; 15: 816–823. - PubMed
1. Gummert JF, Funkat A, Osswald B, et al.EuroSCORE overestimates the risk of cardiac surgery: results from the national registry of the German society of thoracic and cardiovascular surgery. Clin Res Cardiol 2009; 98: 363–369. - PubMed
1. Nashef SAM, Roques F, Sharples LD, et al.EuroSCORE II. Eur J Cardiothorac Surg 2012; 41: 734–745. - PubMed

Grants and funding

CH/17/1/32804/BHF_/British Heart Foundation/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study

Affiliations

Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources