Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2022 Feb;33(2):375-386.
doi: 10.1681/ASN.2021040538. Epub 2022 Jan 11.

Using Machine Learning to Identify Metabolomic Signatures of Pediatric Chronic Kidney Disease Etiology

Affiliations
Multicenter Study

Using Machine Learning to Identify Metabolomic Signatures of Pediatric Chronic Kidney Disease Etiology

Arthur M Lee et al. J Am Soc Nephrol. 2022 Feb.

Abstract

Background: Untargeted plasma metabolomic profiling combined with machine learning (ML) may lead to discovery of metabolic profiles that inform our understanding of pediatric CKD causes. We sought to identify metabolomic signatures in pediatric CKD based on diagnosis: FSGS, obstructive uropathy (OU), aplasia/dysplasia/hypoplasia (A/D/H), and reflux nephropathy (RN).

Methods: Untargeted metabolomic quantification (GC-MS/LC-MS, Metabolon) was performed on plasma from 702 Chronic Kidney Disease in Children study participants (n: FSGS=63, OU=122, A/D/H=109, and RN=86). Lasso regression was used for feature selection, adjusting for clinical covariates. Four methods were then applied to stratify significance: logistic regression, support vector machine, random forest, and extreme gradient boosting. ML training was performed on 80% total cohort subsets and validated on 20% holdout subsets. Important features were selected based on being significant in at least two of the four modeling approaches. We additionally performed pathway enrichment analysis to identify metabolic subpathways associated with CKD cause.

Results: ML models were evaluated on holdout subsets with receiver-operator and precision-recall area-under-the-curve, F1 score, and Matthews correlation coefficient. ML models outperformed no-skill prediction. Metabolomic profiles were identified based on cause. FSGS was associated with the sphingomyelin-ceramide axis. FSGS was also associated with individual plasmalogen metabolites and the subpathway. OU was associated with gut microbiome-derived histidine metabolites.

Conclusion: ML models identified metabolomic signatures based on CKD cause. Using ML techniques in conjunction with traditional biostatistics, we demonstrated that sphingomyelin-ceramide and plasmalogen dysmetabolism are associated with FSGS and that gut microbiome-derived histidine metabolites are associated with OU.

Keywords: chronic kidney disease; machine learning; machine learning collection; metabolomics; pediatric nephrology.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1.
Figure 1.
The analytic flow plan for identifying metabolites associated with CKD etiology. The flow of the analytic approach used to identify individual metabolites associated with each CKD cause. Lasso generated panels of metabolites for subsequent analysis while accounting for metabolite and clinical covariate multicollinearity. Second-pass ML classifiers additionally accounted for dimensionality and multicollinearity among the Lasso-selected metabolites.
Figure 2.
Figure 2.
Volcano plots demonstrate ML stratification of important metabolite signals. These plots visualize the stratification of Lasso-selected metabolites for FSGS and OU. Implication by Benjamini–Hochberg is the least restrictive. Implication by ML modeling is the most restrictive. The signals detected by ML would meet both Benjamini–Hochberg and Bonferroni thresholds.
Figure 3.
Figure 3.
Lasso feature selection improved ML model performance. We performed SVM for FSGS with no feature selection (842 metabolites) and with a forward feature selection (122 metabolites on the basis of LR P<0.05). All three iterations demonstrated better performance than no-skill selection in 20% holdout validation subsets. The SVM with Lasso feature selection outperformed both no-selection and forward-selection models. There were not significant differences in metabolite subpathway signals detected. 95% CI, 95% confidence interval.

References

    1. Hu JR, Coresh J, Inker LA, Levey AS, Zheng Z, Rebholz CM, et al. : Serum metabolites are associated with all-cause mortality in chronic kidney disease. Kidney Int 94: 381–389, 2018 - PMC - PubMed
    1. Coresh J, Inker LA, Sang Y, Chen J, Shafi T, Post WS, et al. : Metabolomic profiling to improve glomerular filtration rate estimation: a proof-of-concept study. Nephrol Dial Transplant 34: 825–833, 2019 - PMC - PubMed
    1. Grams ME, Tin A, Rebholz CM, Shafi T, Köttgen A, Perrone RD, et al. : Metabolomic alterations associated with cause of CKD. Clin J Am Soc Nephrol 12: 1787–1794, 2017 - PMC - PubMed
    1. Hanna MH, Brophy PD: Metabolomics in pediatric nephrology: emerging concepts. Pediatr Nephrol 30: 881–887, 2015 - PMC - PubMed
    1. Denburg MR, Xu Y, Abraham AG, Coresh J, Chen J, Grams ME, et al. ; CKD Biomarkers Consortium : Metabolite biomarkers of CKD progression in children. Clin J Am Soc Nephrol 16: 1178–1189, 2021 - PMC - PubMed

Publication types

MeSH terms