Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2019 Mar 1;26(3):242-253.
doi: 10.1093/jamia/ocy165.

Robust clinical marker identification for diabetic kidney disease with ensemble feature selection

Affiliations
Comparative Study

Robust clinical marker identification for diabetic kidney disease with ensemble feature selection

Xing Song et al. J Am Med Inform Assoc. .

Erratum in

Abstract

Objective: Diabetic kidney disease (DKD) is one of the most frequent complications in diabetes associated with substantial morbidity and mortality. To accelerate DKD risk factor discovery, we present an ensemble feature selection approach to identify a robust set of discriminant factors using electronic medical records (EMRs).

Material and methods: We identified a retrospective cohort of 15 645 adult patients with type 2 diabetes, excluding those with pre-existing kidney disease, and utilized all available clinical data types in modeling. We compared 3 machine-learning-based embedded feature selection methods in conjunction with 6 feature ensemble techniques for selecting top-ranked features in terms of robustness to data perturbations and predictability for DKD onset.

Results: The gradient boosting machine (GBM) with weighted mean rank feature ensemble technique achieved the best performance with an AUC of 0.82 [95%-CI, 0.81-0.83] on internal validation and 0.71 [95%-CI, 0.68-0.73] on external temporal validation. The ensemble model identified a set of 440 features from 84 872 unique clinical features that are both predicative of DKD onset and robust against data perturbations, including 191 labs, 51 visit details (mainly vital signs), 39 medications, 34 orders, 30 diagnoses, and 95 other clinical features.

Discussion: Many of the top-ranked features have not been included in the state-of-art DKD prediction models, but their relationships with kidney function have been suggested in existing literature.

Conclusion: Our ensemble feature selection framework provides an option for identifying a robust and parsimonious feature set unbiasedly from EMR data, which effectively aids in knowledge discovery for DKD risk factors.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Study cohort inclusion and exclusion.
Figure 2.
Figure 2.
Flowchart for the experimental design.
Figure 3.
Figure 3.
AUC vs. number of selected features for different feature selection combos.
Figure 4.
Figure 4.
Stability vs. number of selected features for different feature selection combos.
Figure 5.
Figure 5.
Stability vs. AUC tradeoff for different feature selection combos.
Figure 6.
Figure 6.
Top demographic, lab and vital features and their partial dependence effects on predicting DKD risk.

References

    1. Frederik P, Peter R. Diagnosis of diabetic kidney disease: state of the art and future perspective. Kidney Int Suppl 2018; 8: 2–7. - PMC - PubMed
    1. Katherine RT, George LB, Rudolf WB. Diabetic kidney disease: a report from and ADA consensus conference. Diabetes Care 2014; 37 (10): 2864–83. - PMC - PubMed
    1. Zoppini G, Targher G, Chonchol M, et al. Predictors of estimated GFR decline in patients with type 2 diabetes and preserved kidney function. Clin J Am Soc Nephrol: CJASN 2012; 7 (3): 401–8. - PubMed
    1. Ueda H, Ishimura E, Shoji T, et al. Factors affecting progression of renal failure in patients with type 2 diabetes. Diabetes Care 2003; 26 (5): 1530–4. - PubMed
    1. Rossing K, Christensen PK, Hovind P, et al. Progression of nephropathy in type 2 diabetic patients. Kidney Int 2004; 66 (4): 1596–605. - PubMed

Publication types