Comparison Between Statistical Model and Machine Learning Methods for Predicting the Risk of Renal Function Decline Using Routine Clinical Data in Health Screening
- PMID: 35502445
- PMCID: PMC9056070
- DOI: 10.2147/RMHP.S346856
Comparison Between Statistical Model and Machine Learning Methods for Predicting the Risk of Renal Function Decline Using Routine Clinical Data in Health Screening
Abstract
Purpose: Using machine learning method to predict and judge unknown data offers opportunity to improve accuracy by exploring complex interactions between risk factors. Therefore, we evaluate the performance of machine learning (ML) algorithms and to compare them with logistic regression for predicting the risk of renal function decline (RFD) using routine clinical data.
Patients and methods: This retrospective cohort study includes datasets from 2166 subjects, aged 35-74 years old, provided by an adult health screening follow-up program between 2010 and 2020. Seven different ML models were considered - random forest, gradient boosting, multilayer perceptron, support vector machine, K-nearest neighbors, adaptive boosting, and decision tree - and were compared with standard logistic regression. There were 24 independent variables, and the baseline estimate glomerular filtration rate (eGFR) was used as the predictive variable.
Results: A total of 2166 participants (mean age 49.2±11.2 years old, 63.3% males) were enrolled and randomly divided into a training set (n=1732) and a test set (n=434). The area under receiver operating characteristic curve (AUROC) for detecting RFD corresponding to the different models were above 0.85 during the training phase. The gradient boosting algorithms exhibited the best average prediction accuracy (AUROC: 0.914) among all algorithms validated in this study. Based on AUROC, the ML algorithms improved the RFD prediction performance, compared to logistic regression model (AUROC:0.882), except the K-nearest neighbors and decision tree algorithms (AUROC:0.854 and 0.824, respectively). However, the improvement differences with logistic regression were small (less than 4%) and nonsignificant.
Conclusion: Our results indicate that the proposed health screening dataset-based RFD prediction model using ML algorithms is readily applicable, produces validated results. But logistic regression yields as good performance as ML models to predict the risk of RFD with simple clinical predictors.
Keywords: algorithm; chronic kidney disease; deep learning; health examination.
© 2022 Cao et al.
Conflict of interest statement
The authors report no conflicts of interest in this work.
Figures
Similar articles
-
A systematic comparison of machine learning algorithms to develop and validate prediction model to predict heart failure risk in middle-aged and elderly patients with periodontitis (NHANES 2009 to 2014).Medicine (Baltimore). 2023 Aug 25;102(34):e34878. doi: 10.1097/MD.0000000000034878. Medicine (Baltimore). 2023. PMID: 37653785 Free PMC article.
-
Which supervised machine learning algorithm can best predict achievement of minimum clinically important difference in neck pain after surgery in patients with cervical myelopathy? A QOD study.Neurosurg Focus. 2023 Jun;54(6):E5. doi: 10.3171/2023.3.FOCUS2372. Neurosurg Focus. 2023. PMID: 37283449
-
Impact of Intraoperative Data on Risk Prediction for Mortality After Intra-Abdominal Surgery.Anesth Analg. 2022 Jan 1;134(1):102-113. doi: 10.1213/ANE.0000000000005694. Anesth Analg. 2022. PMID: 34908548 Free PMC article.
-
Predicting hospitalization following psychiatric crisis care using machine learning.BMC Med Inform Decis Mak. 2020 Dec 10;20(1):332. doi: 10.1186/s12911-020-01361-1. BMC Med Inform Decis Mak. 2020. PMID: 33302948 Free PMC article.
-
Prediction of Acute Kidney Injury after Liver Transplantation: Machine Learning Approaches vs. Logistic Regression Model.J Clin Med. 2018 Nov 8;7(11):428. doi: 10.3390/jcm7110428. J Clin Med. 2018. PMID: 30413107 Free PMC article.
Cited by
-
Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction.Glob Epidemiol. 2024 Oct 4;8:100168. doi: 10.1016/j.gloepi.2024.100168. eCollection 2024 Dec. Glob Epidemiol. 2024. PMID: 39435397 Free PMC article.
-
CAREUP: An Integrated Care Platform with Intrinsic Capacity Monitoring and Prediction Capabilities.Sensors (Basel). 2025 Feb 3;25(3):916. doi: 10.3390/s25030916. Sensors (Basel). 2025. PMID: 39943555 Free PMC article.
-
Automated prognosis of renal function decline in ADPKD patients using deep learning.Z Med Phys. 2024 May;34(2):330-342. doi: 10.1016/j.zemedi.2023.08.001. Epub 2023 Aug 21. Z Med Phys. 2024. PMID: 37612178 Free PMC article.
-
Classification and Regression Trees analysis identifies patients at high risk for kidney function decline following hospitalization.PLoS One. 2025 Jan 31;20(1):e0317558. doi: 10.1371/journal.pone.0317558. eCollection 2025. PLoS One. 2025. PMID: 39888928 Free PMC article.
References
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous