. 2022 Jul 28:9:911737.

doi: 10.3389/fmed.2022.911737. eCollection 2022.

Using random forest algorithm for glomerular and tubular injury diagnosis

Wenzhu Song¹, Xiaoshuang Zhou², Qi Duan³, Qian Wang³, Yaheng Li³, Aizhong Li³, Wenjing Zhou⁴, Lin Sun⁵, Lixia Qiu¹, Rongshan Li^{2

3}, Yafeng Li^{2

3

6

7}

Affiliations

¹ School of Public Health, Shanxi Medical University, Taiyuan, China.
² Department of Nephrology, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China.
³ Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China.
⁴ School of Medical Sciences, Shanxi University of Chinese Medicine, Jinzhong, China.
⁵ College of Traditional Chinese Medicine and Food Engineering, Shanxi University of Chinese Medicine, Jinzhong, China.
⁶ Core Laboratory, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China.
⁷ Academy of Microbial Ecology, Shanxi Medical University, Taiyuan, China.

PMID: 35966858
PMCID: PMC9366016
DOI: 10.3389/fmed.2022.911737

Using random forest algorithm for glomerular and tubular injury diagnosis

Wenzhu Song et al. Front Med (Lausanne). 2022.

. 2022 Jul 28:9:911737.

doi: 10.3389/fmed.2022.911737. eCollection 2022.

Authors

Wenzhu Song¹, Xiaoshuang Zhou², Qi Duan³, Qian Wang³, Yaheng Li³, Aizhong Li³, Wenjing Zhou⁴, Lin Sun⁵, Lixia Qiu¹, Rongshan Li^{2

3}, Yafeng Li^{2

3

6

7}

Affiliations

¹ School of Public Health, Shanxi Medical University, Taiyuan, China.
² Department of Nephrology, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China.
³ Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China.
⁴ School of Medical Sciences, Shanxi University of Chinese Medicine, Jinzhong, China.
⁵ College of Traditional Chinese Medicine and Food Engineering, Shanxi University of Chinese Medicine, Jinzhong, China.
⁶ Core Laboratory, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China.
⁷ Academy of Microbial Ecology, Shanxi Medical University, Taiyuan, China.

PMID: 35966858
PMCID: PMC9366016
DOI: 10.3389/fmed.2022.911737

Abstract

Objectives: Chronic kidney disease (CKD) is a common chronic condition with high incidence and insidious onset. Glomerular injury (GI) and tubular injury (TI) represent early manifestations of CKD and could indicate the risk of its development. In this study, we aimed to classify GI and TI using three machine learning algorithms to promote their early diagnosis and slow the progression of CKD.

Methods: Demographic information, physical examination, blood, and morning urine samples were first collected from 13,550 subjects in 10 counties in Shanxi province for classification of GI and TI. Besides, LASSO regression was employed for feature selection of explanatory variables, and the SMOTE (synthetic minority over-sampling technique) algorithm was used to balance target datasets, i.e., GI and TI. Afterward, Random Forest (RF), Naive Bayes (NB), and logistic regression (LR) were constructed to achieve classification of GI and TI, respectively.

Results: A total of 12,330 participants enrolled in this study, with 20 explanatory variables. The number of patients with GI, and TI were 1,587 (12.8%) and 1,456 (11.8%), respectively. After feature selection by LASSO, 14 and 15 explanatory variables remained in these two datasets. Besides, after SMOTE, the number of patients and normal ones were 6,165, 6,165 for GI, and 6,165, 6,164 for TI, respectively. RF outperformed NB and LR in terms of accuracy (78.14, 80.49%), sensitivity (82.00, 84.60%), specificity (74.29, 76.09%), and AUC (0.868, 0.885) for both GI and TI; the four variables contributing most to the classification of GI and TI represented SBP, DBP, sex, age and age, SBP, FPG, and GHb, respectively.

Conclusion: RF boasts good performance in classifying GI and TI, which allows for early auxiliary diagnosis of GI and TI, thus facilitating to help alleviate the progression of CKD, and enjoying great prospects in clinical practice.

Keywords: auxiliary diagnosis; glomerular injury; machine learning; random forest; tubular injury.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Before and after SMOTE of response variables for GI and TI. SMOTE, Synthetic Minority Over-Sampling Technique. It’s a good and powerful way to handle imbalanced data, and it was conducted under the parameters of k = 5, C.perc = “balance,” dist = “Overlap.” **(A)** GI before SMOTE; **(B)** TI before SMOTE; **(C)** GI after SMOTE; **(D)** TI after SMOTE.

**FIGURE 2**
Workflow of the model construction.

**FIGURE 3**
Results of feature selection using LASSO. When Lamda is minimum, corresponding features were taken into model construction (14 features for GI, and 15 feature for TI). **(A)** Feature selection for GI; **(B)** feature selection for TI.

**FIGURE 4**
Comparison of the ROC curve areas of the three model classifiers. In model construction, 70% of samples were randomly divided as training set, and the rest 30% were as testing set. AUC (area under curve) was used to evaluate the performance of these three classifiers. **(A)** AUC of GI in the training set; **(B)** AUC of GI in the testing set; **(C)** AUC of TI in the training set; **(D)** AUC of TI in the testing set.

**FIGURE 5**
Contributions of explanatory variables to the random forest model. The “%IncMSE” is the increase in mean squared error, where the error of the model prediction is increased by randomly replacing the value of each predictor variable if it is more important. Therefore, a larger value indicates that the variable is more important.

See this image and copyright information in PMC

Cited by

Using Bayesian networks with Tabu-search algorithm to explore risk factors for hyperhomocysteinemia.
Song W, Qin Z, Hu X, Han H, Li A, Zhou X, Li Y, Li R. Song W, et al. Sci Rep. 2023 Jan 28;13(1):1610. doi: 10.1038/s41598-023-28123-z. Sci Rep. 2023. PMID: 36709366 Free PMC article.
A Random Forest Algorithm for Assessing Risk Factors Associated With Chronic Kidney Disease: Observational Study.
Liu P, Liu Y, Liu H, Xiong L, Mei C, Yuan L. Liu P, et al. Asian Pac Isl Nurs J. 2024 Jun 3;8:e48378. doi: 10.2196/48378. Asian Pac Isl Nurs J. 2024. PMID: 38830204 Free PMC article.
Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province.
Song W, Liu Y, Qiu L, Qing J, Li A, Zhao Y, Li Y, Li R, Zhou X. Song W, et al. Front Med (Lausanne). 2023 Jan 9;9:930541. doi: 10.3389/fmed.2022.930541. eCollection 2022. Front Med (Lausanne). 2023. PMID: 36698845 Free PMC article.
Gut microbiota landscape and potential biomarker identification in female patients with systemic lupus erythematosus using machine learning.
Song W, Wu F, Yan Y, Li Y, Wang Q, Hu X, Li Y. Song W, et al. Front Cell Infect Microbiol. 2023 Dec 19;13:1289124. doi: 10.3389/fcimb.2023.1289124. eCollection 2023. Front Cell Infect Microbiol. 2023. PMID: 38169617 Free PMC article.
Using elastography-based multilayer perceptron model to evaluate renal fibrosis in chronic kidney disease.
Chen Z, Ying TC, Chen J, Wu C, Li L, Chen H, Xiao T, Huang Y, Chen X, Jiang J, Wang Y, Lu W, Su Z. Chen Z, et al. Ren Fail. 2023 Dec;45(1):2202755. doi: 10.1080/0886022X.2023.2202755. Ren Fail. 2023. PMID: 37073623 Free PMC article. Clinical Trial.

See all "Cited by" articles

References

1. Lv JC, Zhang LX. Prevalence and disease burden of chronic kidney disease. Adv Exp Med Biol. (2019) 1165:3–15. 10.1007/978-981-13-8871-2_1 - DOI - PubMed
1. Wilson S, Mone P, Jankauskas SS, Gambardella J, Santulli G. Chronic kidney disease: definition, updated epidemiology, staging, and mechanisms of increased cardiovascular risk. J Clin Hypertens. (2021) 23:831–4. 10.1111/jch.14186 - DOI - PMC - PubMed
1. Zhang L, Wang F, Wang L, Wang W, Liu B, Liu J, et al. Prevalence of chronic kidney disease in China: a cross-sectional survey. Lancet. (2012) 379:815–22. 10.1016/S0140-6736(12)60033-6 - DOI - PubMed
1. Zheng X, Wang F, Zhang J, Cui X, Jiang F, Chen N, et al. Using machine learning to predict atrial fibrillation diagnosed after ischemic stroke. Int J Cardiol. (2021) 347:21–7. 10.1016/j.ijcard.2021.11.005 - DOI - PubMed
1. Ruini C, Schlingmann S, Jonke Ž, Avci P, Padrón-Laso V, Neumeier F. Machine learning based prediction of squamous cell carcinoma in ex vivo confocal laser scanning microscopy. Cancers. (2021) 13:5522. 10.3390/cancers13215522 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using random forest algorithm for glomerular and tubular injury diagnosis

Affiliations

Using random forest algorithm for glomerular and tubular injury diagnosis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources