. 2023 May 22;23(1):98.

doi: 10.1186/s12911-023-02185-5.

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia

Getahun Mulugeta¹, Temesgen Zewotir², Awoke Seyoum Tegegne³, Leja Hamza Juhar⁴, Mahteme Bekele Muleta⁴

Affiliations

¹ Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia. gech.marr@gmail.com.
² School of Mathematics, Statistics, and Computer Science, KwaZulu-Natal University, Durban, South Africa.
³ Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia.
⁴ St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia.

PMID: 37217892
PMCID: PMC10201495
DOI: 10.1186/s12911-023-02185-5

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia

Getahun Mulugeta et al. BMC Med Inform Decis Mak. 2023.

. 2023 May 22;23(1):98.

doi: 10.1186/s12911-023-02185-5.

Authors

Getahun Mulugeta¹, Temesgen Zewotir², Awoke Seyoum Tegegne³, Leja Hamza Juhar⁴, Mahteme Bekele Muleta⁴

Affiliations

¹ Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia. gech.marr@gmail.com.
² School of Mathematics, Statistics, and Computer Science, KwaZulu-Natal University, Durban, South Africa.
³ Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia.
⁴ St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia.

PMID: 37217892
PMCID: PMC10201495
DOI: 10.1186/s12911-023-02185-5

Abstract

Introduction: The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study aimed to predict the risk of graft failure among post-transplant recipients in Ethiopia using the selected machine learning prediction models.

Methodology: The data was extracted from the retrospective cohort of kidney transplant recipients at the Ethiopian National Kidney Transplantation Center from September 2015 to February 2022. In response to the imbalanced nature of the data, we performed hyperparameter tuning, probability threshold moving, tree-based ensemble learning, stacking ensemble learning, and probability calibrations to improve the prediction results. Merit-based selected probabilistic (logistic regression, naive Bayes, and artificial neural network) and tree-based ensemble (random forest, bagged tree, and stochastic gradient boosting) models were applied. Model comparison was performed in terms of discrimination and calibration performance. The best-performing model was then used to predict the risk of graft failure.

Results: A total of 278 completed cases were analyzed, with 21 graft failures and 3 events per predictor. Of these, 74.8% are male, and 25.2% are female, with a median age of 37. From the comparison of models at the individual level, the bagged tree and random forest have top and equal discrimination performance (AUC-ROC = 0.84). In contrast, the random forest has the best calibration performance (brier score = 0.045). Under testing the individual model as a meta-learner for stacking ensemble learning, the result of stochastic gradient boosting as a meta-learner has the top discrimination (AUC-ROC = 0.88) and calibration (brier score = 0.048) performance. Regarding feature importance, chronic rejection, blood urea nitrogen, number of post-transplant admissions, phosphorus level, acute rejection, and urological complications are the top predictors of graft failure.

Conclusions: Bagging, boosting, and stacking, with probability calibration, are good choices for clinical risk predictions working on imbalanced data. The data-driven probability threshold is more beneficial than the natural threshold of 0.5 to improve the prediction result from imbalanced data. Integrating various techniques in a systematic framework is a smart strategy to improve prediction results from imbalanced data. It is recommended for clinical experts in kidney transplantation to use the final calibrated model as a decision support system to predict the risk of graft failure for individual patients.

Keywords: Graft failure; Imbalanced Data; Probabilistic models; Renal transplantation; Stacking ensemble; Tree-based ensembles.

PubMed Disclaimer

Conflict of interest statement

Authors declared no conflict of interest between the author and institutions.

Figures

**Fig. 1**
Top 20 Important Features from the Recursive Feature Elimination Method

**Fig. 2**
Experimental Procedures of the Study

**Fig. 3**
Reliability Plot for the Base Models

**Fig. 4**
Reliability Plot for the Tuned Models

**Fig. 5**
Bar Chart for the Discrimination and Calibration Performance of Individual Calibrated Models

**Fig. 6**
Reliability Plot for Individual Calibrated Models

**Fig. 7**
ROC Curve for the Final Calibrated Probabilities

**Fig. 8**
Reliability Plot for the Final Calibrated Probabilities

**Fig. 9**
Feature’s Relative Importance to Predict Graft Failure

See this image and copyright information in PMC

References

1. Stamenic, D., Joint modelling of longitudinal and time-to-event data: analysis of predictive factors of graft outcomes in kidney transplant recipients. 2018, Université de Limoges.
1. Alemu, H., et al., Prevalence of chronic kidney Disease and Associated factors among patients with diabetes in Northwest Ethiopia: A Hospital-Based cross-sectional study. 2020. 92. - PMC - PubMed
1. Wang, J.H. and A.J.K. Hart, Global perspective on kidney transplantation: United States 2021. 2(11): p. 1836. - PMC - PubMed
1. Hart, A., et al., OPTN/SRTR 2017 annual data report: kidney 2019. 19: p. 19–123. - PubMed
1. Yazigi NA. Long term outcomes after pediatric liver transplantation Pediatric gastroenterology. hepatology & nutrition. 2013;16(4):207–218. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia

Affiliations

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources