Prediction of three-year all-cause mortality in patients with heart failure and atrial fibrillation using the CatBoost model

doi:10.1186/s12872-025-04928-w

. 2025 Jul 4;25(1):466.

doi: 10.1186/s12872-025-04928-w.

Prediction of three-year all-cause mortality in patients with heart failure and atrial fibrillation using the CatBoost model

Jiacan Wu¹, Guanghong Tao¹, Siyuan Xie¹, Han Yang², Fenglin Qi¹, Naiyue Bao¹, Zhuo Li¹, Guanglei Chang³, Hua Xiao⁴

Affiliations

¹ Department of Cardiovascular Medicine, Cardiovascular Research Center, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
² School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing, 400074, China.
³ Department of Cardiovascular Medicine, Cardiovascular Research Center, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China. augustchang.2008@qq.com.
⁴ Department of Cardiovascular Medicine, Cardiovascular Research Center, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China. 202235@hospital.cqmu.edu.cn.

PMID: 40615809
PMCID: PMC12232015
DOI: 10.1186/s12872-025-04928-w

Prediction of three-year all-cause mortality in patients with heart failure and atrial fibrillation using the CatBoost model

Jiacan Wu et al. BMC Cardiovasc Disord. 2025.

. 2025 Jul 4;25(1):466.

doi: 10.1186/s12872-025-04928-w.

Authors

Jiacan Wu¹, Guanghong Tao¹, Siyuan Xie¹, Han Yang², Fenglin Qi¹, Naiyue Bao¹, Zhuo Li¹, Guanglei Chang³, Hua Xiao⁴

Affiliations

¹ Department of Cardiovascular Medicine, Cardiovascular Research Center, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
² School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing, 400074, China.
³ Department of Cardiovascular Medicine, Cardiovascular Research Center, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China. augustchang.2008@qq.com.
⁴ Department of Cardiovascular Medicine, Cardiovascular Research Center, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China. 202235@hospital.cqmu.edu.cn.

PMID: 40615809
PMCID: PMC12232015
DOI: 10.1186/s12872-025-04928-w

Abstract

Background: Heart failure and atrial fibrillation (HF-AF) frequently coexist, resulting in complex interactions that substantially elevate mortality risk. This study aimed to develop and validate a machine learning (ML) model predicting the 3-year all-cause mortality risk in HF-AF patients to support personalized risk stratification and management.

Method: This retrospective cohort study included 558 HF-AF patients admitted in 2018, with a median follow-up duration of 1,185 days. The cohort was randomly divided into training (70%) and test (30%) sets. Feature selection utilized the Boruta algorithm and least absolute shrinkage and selection operator regression. Six ML models were trained using tenfold cross-validation and optimized via grid search. Model performance was evaluated across 12 metrics, including the area under the receiver operating characteristic curve (AUC), to identify the best-performing model. Subsequently, Shapley Additive exPlanations (SHAP) analysis was used to interpret the optimal model and investigate interactions between features.

Results: Of the 558 patients, 215 reached the primary endpoint. Feature selection identified 14 key variables for model development. The best-performing model, CatBoost, achieved the highest AUC (0.809) and demonstrated robust performance across multiple evaluation metrics. SHAP analysis highlighted the New York Heart Association (NYHA) classification, absolute lymphocyte count (ALC), high-sensitivity C-reactive protein, B-type natriuretic peptide (BNP), and age as key predictors. SHAP interaction analysis identified several feature interactions, with relatively strong ones observed between ALC and NYHA classification, and ALC and BNP.

Conclusions: CatBoost was identified as the optimal model for predicting three-year all-cause mortality in HF-AF patients, potentially aiding clinicians in risk stratification and individualized treatment planning to improve patient outcomes.

Keywords: All-cause mortality; Atrial fibrillation; Heart failure; Machine learning; Prediction model.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The present study involves human participants and was approved by the Ethics Committee of the First Hospital of Chongqing Medical University (reference number 2020-528) and adhered to the guidelines of the Helsinki Declaration. Written informed consent was obtained from all individual participants. Consent for publication: Informed consent was obtained from all individual participants included in the study. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Flowchart of patient selection, data processing, model development, and validation. Abbreviations: HF, heart failure; AF, atrial fibrillation; HF-AF, heart failure with atrial fibrillation; LASSO, least absolute shrinkage and selection operator; CatBoost, Categorical Boosting; NN, Neural Networks; LR, logistic regression; RF, Random Forest; SVM, Support Vector Machines; GBDT, gradient boosting decision tree; AUC, area under the curve; PPV, Positive Predictive Value; NPV, Negative Predictive Value; ACC, accuracy; F1, the harmonic mean of precision and recall; MCC, Matthews Correlation Coefficient; BS, Brier Score; DCA, Decision Curve Analysis; SHAP, Shapley Additive exPlanations

**Fig. 2**
ROC curves for each model in the training and test sets. A ROC curve in the training set, showing RF with the highest AUC (0.935), followed by GBDT (0.930), CatBoost (0.869), NN (0.801), SVM (0.794), and Lasso-LR (0.760); B ROC curve in the test set, demonstrating CatBoost with the highest AUC (0.809), followed by NN (0.802), Lasso-LR (0.793), RF (0.790), SVM (0.773), and GBDT (0.732). Abbreviations: ROC, receiver operating characteristic; AUC, area under the curve; CatBoost, Categorical Boosting; NN, Neural Networks; Lasso-LR, least absolute shrinkage and selection operator-penalized logistic regression; RF, Random Forest; SVM, Support Vector Machines; GBDT, gradient boosting decision tree

**Fig. 3**
Evaluation metrics for each model in the training and test sets. A Evaluation metrics in the training set; B Evaluation metrics in the test set. Abbreviations: CatBoost, Categorical Boosting; NN, Neural Networks; Lasso-LR, least absolute shrinkage and selection operator-penalized logistic regression; RF, Random Forest; SVM, Support Vector Machines; GBDT, gradient boosting decision tree; PPV, Positive Predictive Value; NPV, Negative Predictive Value; ACC, accuracy; F1, the harmonic mean of precision and recall; MCC, Matthews Correlation Coefficient

**Fig. 4**
Calibration and DCA curves for each model in the training and test sets. A Calibration curves in the training set. GBDT and RF demonstrated better calibration, with curves closest to the ideal diagonal line. CatBoost and NN showed moderate calibration, while SVM and Lasso-LR exhibited relatively lower predicted probabilities compared to other models. B Calibration curves in the test set. NN and CatBoost demonstrated better calibration, followed by RF and Lasso-LR, while GBDT and SVM showed relatively lower predicted probabilities. C Decision curve analysis in the training set. GBDT, RF, and CatBoost yielded the highest net benefit across most threshold probabilities, indicating superior clinical utility. D Decision curve analysis in the test set. All models demonstrated greater net benefit than the treat-all and treat-none strategies across a wide range of threshold probabilities. Abbreviations: CatBoost, Categorical Boosting; NN, Neural Networks; Lasso-LR, least absolute shrinkage and selection operator-penalized logistic regression; RF, Random Forest; SVM, Support Vector Machines; GBDT, gradient boosting decision tree; BS, Brier Score; DCA, Decision Curve Analysis

**Fig. 5**
SHAP explanations for CatBoost model. A Summary plot of the SHAP values for CatBoost. Each point represents a SHAP value for a feature in an individual patient. Features are ranked by their importance based on the mean absolute SHAP values. Orange points indicate higher feature values, while blue points indicate lower values. A positive SHAP value indicates a greater contribution to predicted risk, whereas a negative value indicates a protective effect; B Ranking of feature importance based on the average absolute SHAP values. The bar plot displays the mean absolute SHAP value for each feature, reflecting its average contribution to the model’s predictions across all samples. Features with higher values have a greater impact on the output of the CatBoost model. NYHA classification, LAC, and hs-CRP are the top three most influential features. Abbreviations: SHAP, Shapley Additive exPlanations; CatBoost, Categorical Boosting; NYHA, New York Heart Association; ALC, absolute lymphocyte count; hs-CRP, high-sensitivity C-reactive protein; BNP, B-type natriuretic peptide; LVEDD, left ventricular end-diastolic dimension; BMI, body mass index; BUN, blood urea nitrogen; RAD, right atrial dimension; ALB, albumin

**Fig. 6**
SHAP independence plot for each feature. Each plot (A-N) demonstrates how changes in feature values affect the model’s predictions, with higher SHAP values indicating a stronger impact on the outcome. The features include: A NYHA classification, B ALC, C hs-CRP, D BNP, E Age, F LVEDD, G BMI, H Anticoagulation duration, I BUN, J Anemia, K Length of stay, L RAD, M ALB, and N Digoxin. Abbreviations: SHAP, Shapley Additive Explanation; NYHA, New York Heart Abbreviations: SHAP, Shapley Additive exPlanations; NYHA, New York Heart Association; ALC, absolute lymphocyte count; hs-CRP, high-sensitivity C-reactive protein; BNP, B-type natriuretic peptide; LVEDD, left ventricular end-diastolic dimension; BMI, body mass index; BUN, blood urea nitrogen; RAD, right atrial dimension; ALB, albumin

See this image and copyright information in PMC

References

1. Liu Z, Li Z, Li X, Yan Y, Liu J, Wang J, et al. Global trends in heart failure from 1990 to 2019: An age-period-cohort analysis from the Global Burden of Disease study. ESC Heart Fail. 2024;11(5):3264–78. - PMC - PubMed
1. Elliott AD, Middeldorp ME, Van Gelder IC, Albert CM, Sanders P. Epidemiology and modifiable risk factors for atrial fibrillation. Nat Rev Cardiol. 2023;20(6):404–17. - PubMed
1. Reddy YNV, Borlaug BA, Gersh BJ. Management of Atrial Fibrillation Across the Spectrum of Heart Failure With Preserved and Reduced Ejection Fraction. Circulation. 2022;146(4):339–57. - PubMed
1. Vermond RA, Geelhoed B, Verweij N, Tieleman RG, Van der Harst P, Hillege HL, et al. Incidence of Atrial Fibrillation and Relationship With Cardiovascular Events, Heart Failure, and Mortality: A Community-Based Study From the Netherlands. J Am Coll Cardiol. 2015;66(9):1000–7. - PubMed
1. Carlisle MA, Fudim M, DeVore AD, Piccini JP. Heart Failure and Atrial Fibrillation, Like Fire and Fury. JACC Heart Fail. 2019;7(6):447–56. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

[1] Liu Z, Li Z, Li X, Yan Y, Liu J, Wang J, et al. Global trends in heart failure from 1990 to 2019: An age-period-cohort analysis from the Global Burden of Disease study. ESC Heart Fail. 2024;11(5):3264–78. - PMC - PubMed

[2] Liu Z, Li Z, Li X, Yan Y, Liu J, Wang J, et al. Global trends in heart failure from 1990 to 2019: An age-period-cohort analysis from the Global Burden of Disease study. ESC Heart Fail. 2024;11(5):3264–78. - PMC - PubMed

[3] Elliott AD, Middeldorp ME, Van Gelder IC, Albert CM, Sanders P. Epidemiology and modifiable risk factors for atrial fibrillation. Nat Rev Cardiol. 2023;20(6):404–17. - PubMed

[4] Elliott AD, Middeldorp ME, Van Gelder IC, Albert CM, Sanders P. Epidemiology and modifiable risk factors for atrial fibrillation. Nat Rev Cardiol. 2023;20(6):404–17. - PubMed

[5] Reddy YNV, Borlaug BA, Gersh BJ. Management of Atrial Fibrillation Across the Spectrum of Heart Failure With Preserved and Reduced Ejection Fraction. Circulation. 2022;146(4):339–57. - PubMed

[6] Reddy YNV, Borlaug BA, Gersh BJ. Management of Atrial Fibrillation Across the Spectrum of Heart Failure With Preserved and Reduced Ejection Fraction. Circulation. 2022;146(4):339–57. - PubMed

[7] Vermond RA, Geelhoed B, Verweij N, Tieleman RG, Van der Harst P, Hillege HL, et al. Incidence of Atrial Fibrillation and Relationship With Cardiovascular Events, Heart Failure, and Mortality: A Community-Based Study From the Netherlands. J Am Coll Cardiol. 2015;66(9):1000–7. - PubMed

[8] Vermond RA, Geelhoed B, Verweij N, Tieleman RG, Van der Harst P, Hillege HL, et al. Incidence of Atrial Fibrillation and Relationship With Cardiovascular Events, Heart Failure, and Mortality: A Community-Based Study From the Netherlands. J Am Coll Cardiol. 2015;66(9):1000–7. - PubMed

[9] Carlisle MA, Fudim M, DeVore AD, Piccini JP. Heart Failure and Atrial Fibrillation, Like Fire and Fury. JACC Heart Fail. 2019;7(6):447–56. - PubMed

[10] Carlisle MA, Fudim M, DeVore AD, Piccini JP. Heart Failure and Atrial Fibrillation, Like Fire and Fury. JACC Heart Fail. 2019;7(6):447–56. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction of three-year all-cause mortality in patients with heart failure and atrial fibrillation using the CatBoost model

Affiliations

Prediction of three-year all-cause mortality in patients with heart failure and atrial fibrillation using the CatBoost model

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous