Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 9:9:930541.
doi: 10.3389/fmed.2022.930541. eCollection 2022.

Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province

Affiliations

Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province

Wenzhu Song et al. Front Med (Lausanne). .

Abstract

Introduction: Chronic kidney disease (CKD) is a progressive disease with high incidence but early imperceptible symptoms. Since China's rural areas are subject to inadequate medical check-ups and single disease screening programme, it could easily translate into end-stage renal failure. This study aimed to construct an early warning model for CKD tailored to impoverished areas by employing machine learning (ML) algorithms with easily accessible parameters from ten rural areas in Shanxi Province, thereby, promoting a forward shift of treatment time and improving patients' quality of life.

Methods: From April to November 2019, CKD opportunistic screening was carried out in 10 rural areas in Shanxi Province. First, general information, physical examination data, blood and urine specimens were collected from 13,550 subjects. Afterward, feature selection of explanatory variables was performed using LASSO regression, and target datasets were balanced using the SMOTE (synthetic minority over-sampling technique) algorithm, i.e., albuminuria-to-creatinine ratio (ACR) and α1-microglobulin-to-creatinine ratio (MCR). Next, Bagging, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) were employed for classification of ACR outcomes and MCR outcomes, respectively.

Results: 12,330 rural residents were included in this study, with 20 explanatory variables. The cases with increased ACR and increased MCR represented 1,587 (12.8%) and 1,456 (11.8%), respectively. After conducting LASSO, 14 and 15 explanatory variables remained in these two datasets, respectively. Bagging, RF, and XGBoost performed well in classification, with the AUC reaching 0.74, 0.87, 0.87, 0.89 for ACR outcomes and 0.75, 0.88, 0.89, 0.90 for MCR outcomes. The five variables contributing most to the classification of ACR outcomes and MCR outcomes constituted SBP, TG, TC, and Hcy, DBP and age, TG, SBP, Hcy and FPG, respectively. Overall, the machine learning algorithms could emerge as a warning model for CKD.

Conclusion: ML algorithms in conjunction with rural accessible indexes boast good performance in classification, which allows for an early warning model for CKD. This model could help achieve large-scale population screening for CKD in poverty-stricken areas and should be promoted to improve the quality of life and reduce the mortality rate.

Keywords: albuminuria-to-creatinine ratio; auxiliary diagnosis; chronic kidney disease; machine learning; warning model; α1-microglobulin-to-creatinine ratio.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Workflow of the model construction.
FIGURE 2
FIGURE 2
Before and after SMOTE of response variables for ACR and MCR outcomes. SMOTE, Synthetic Minority Over-Sampling Technique. It’s a good and powerful way to handle imbalanced data, and it was conducted under the parameters of k = 5, C.perc = “balance”, dist = “Overlap”. (A) ACR outcomes (before SMOTE); (B) MCR outcomes (before SMOTE); (C) ACR outcomes (after SMOTE); and (D) MCR outcomes (after SMOTE).
FIGURE 3
FIGURE 3
Results of feature selection using LASSO. When Lamda is minimum, corresponding features were taken into model construction, that is, 14 features for ACR outcomes (A) and 15 feature for MCR outcomes (B).
FIGURE 4
FIGURE 4
Contributions of explanatory variables to the XGBoost algorithm. The “Gain” means the relative contribution of the corresponding feature to the model calculated by taking the contribution of each feature to each tree in the model. The high value of this metric compared to other characteristics means that it is more important for generating predictions. Therefore, a larger value indicates that the variable is more important; ACR outcomes (A) and MCR outcomes (B).

Similar articles

Cited by

References

    1. Lv JC, Zhang LX. Prevalence and disease burden of chronic kidney disease. Adv Exp Med Biol. (2019) 1165:3–15. 10.1007/978-981-13-8871-2_1 - DOI - PubMed
    1. Zhang L, Wang F, Wang L, Wang W, Liu B, Liu J, et al. Prevalence of chronic kidney disease in China: a cross-sectional survey. Lancet. (2012) 379:815–22. 10.1016/S0140-6736(12)60033-6 - DOI - PubMed
    1. Wilson S, Mone P, Jankauskas SS, Gambardella J, Santulli G. Chronic kidney disease: definition, updated epidemiology, staging, and mechanisms of increased cardiovascular risk. J Clin Hypertens. (2021) 23:831–4. 10.1111/jch.14186 - DOI - PMC - PubMed
    1. Han J, Wu MC, Yang T. Challenge of China’s rural health. BMJ. (2016) 353:i2003. 10.1136/bmj.i2003 - DOI - PubMed
    1. Song S, Yuan B, Zhang L, Cheng G, Zhu W, Hou Z, et al. Increased inequalities in health resource and access to health care in rural China. Int J Environ Res Public Health. (2018) 16:49. 10.3390/ijerph16010049 - DOI - PMC - PubMed