Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 28:9:911737.
doi: 10.3389/fmed.2022.911737. eCollection 2022.

Using random forest algorithm for glomerular and tubular injury diagnosis

Affiliations

Using random forest algorithm for glomerular and tubular injury diagnosis

Wenzhu Song et al. Front Med (Lausanne). .

Abstract

Objectives: Chronic kidney disease (CKD) is a common chronic condition with high incidence and insidious onset. Glomerular injury (GI) and tubular injury (TI) represent early manifestations of CKD and could indicate the risk of its development. In this study, we aimed to classify GI and TI using three machine learning algorithms to promote their early diagnosis and slow the progression of CKD.

Methods: Demographic information, physical examination, blood, and morning urine samples were first collected from 13,550 subjects in 10 counties in Shanxi province for classification of GI and TI. Besides, LASSO regression was employed for feature selection of explanatory variables, and the SMOTE (synthetic minority over-sampling technique) algorithm was used to balance target datasets, i.e., GI and TI. Afterward, Random Forest (RF), Naive Bayes (NB), and logistic regression (LR) were constructed to achieve classification of GI and TI, respectively.

Results: A total of 12,330 participants enrolled in this study, with 20 explanatory variables. The number of patients with GI, and TI were 1,587 (12.8%) and 1,456 (11.8%), respectively. After feature selection by LASSO, 14 and 15 explanatory variables remained in these two datasets. Besides, after SMOTE, the number of patients and normal ones were 6,165, 6,165 for GI, and 6,165, 6,164 for TI, respectively. RF outperformed NB and LR in terms of accuracy (78.14, 80.49%), sensitivity (82.00, 84.60%), specificity (74.29, 76.09%), and AUC (0.868, 0.885) for both GI and TI; the four variables contributing most to the classification of GI and TI represented SBP, DBP, sex, age and age, SBP, FPG, and GHb, respectively.

Conclusion: RF boasts good performance in classifying GI and TI, which allows for early auxiliary diagnosis of GI and TI, thus facilitating to help alleviate the progression of CKD, and enjoying great prospects in clinical practice.

Keywords: auxiliary diagnosis; glomerular injury; machine learning; random forest; tubular injury.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Before and after SMOTE of response variables for GI and TI. SMOTE, Synthetic Minority Over-Sampling Technique. It’s a good and powerful way to handle imbalanced data, and it was conducted under the parameters of k = 5, C.perc = “balance,” dist = “Overlap.” (A) GI before SMOTE; (B) TI before SMOTE; (C) GI after SMOTE; (D) TI after SMOTE.
FIGURE 2
FIGURE 2
Workflow of the model construction.
FIGURE 3
FIGURE 3
Results of feature selection using LASSO. When Lamda is minimum, corresponding features were taken into model construction (14 features for GI, and 15 feature for TI). (A) Feature selection for GI; (B) feature selection for TI.
FIGURE 4
FIGURE 4
Comparison of the ROC curve areas of the three model classifiers. In model construction, 70% of samples were randomly divided as training set, and the rest 30% were as testing set. AUC (area under curve) was used to evaluate the performance of these three classifiers. (A) AUC of GI in the training set; (B) AUC of GI in the testing set; (C) AUC of TI in the training set; (D) AUC of TI in the testing set.
FIGURE 5
FIGURE 5
Contributions of explanatory variables to the random forest model. The “%IncMSE” is the increase in mean squared error, where the error of the model prediction is increased by randomly replacing the value of each predictor variable if it is more important. Therefore, a larger value indicates that the variable is more important.

Similar articles

Cited by

References

    1. Lv JC, Zhang LX. Prevalence and disease burden of chronic kidney disease. Adv Exp Med Biol. (2019) 1165:3–15. 10.1007/978-981-13-8871-2_1 - DOI - PubMed
    1. Wilson S, Mone P, Jankauskas SS, Gambardella J, Santulli G. Chronic kidney disease: definition, updated epidemiology, staging, and mechanisms of increased cardiovascular risk. J Clin Hypertens. (2021) 23:831–4. 10.1111/jch.14186 - DOI - PMC - PubMed
    1. Zhang L, Wang F, Wang L, Wang W, Liu B, Liu J, et al. Prevalence of chronic kidney disease in China: a cross-sectional survey. Lancet. (2012) 379:815–22. 10.1016/S0140-6736(12)60033-6 - DOI - PubMed
    1. Zheng X, Wang F, Zhang J, Cui X, Jiang F, Chen N, et al. Using machine learning to predict atrial fibrillation diagnosed after ischemic stroke. Int J Cardiol. (2021) 347:21–7. 10.1016/j.ijcard.2021.11.005 - DOI - PubMed
    1. Ruini C, Schlingmann S, Jonke Ž, Avci P, Padrón-Laso V, Neumeier F. Machine learning based prediction of squamous cell carcinoma in ex vivo confocal laser scanning microscopy. Cancers. (2021) 13:5522. 10.3390/cancers13215522 - DOI - PMC - PubMed

LinkOut - more resources