A machine learning-based diagnosis modelling of type 2 diabetes mellitus with environmental metal exposure
- PMID: 37037162
- DOI: 10.1016/j.cmpb.2023.107537
A machine learning-based diagnosis modelling of type 2 diabetes mellitus with environmental metal exposure
Abstract
Background and objective: Increasing and compelling evidence has been proved that urinary and dietary metal exposure are underappreciated but potentially modifiable biomarkers for type 2 diabetes mellitus (T2DM). The aims of this study were (1) to identify the key potential biomarkers which contributed to T2DM with effective and parsimonious features and (2) to assess the utility of baseline variables and metal exposure in the diagnosis of T2DM.
Methods: Based on the National Health and Nutrition Examination Survey (NHANES), we selected 9822 screening records with 82 significant variables covering demographics, lifestyle, anthropometric measures, diet and metal exposure for this study. Combining extreme gradient boosting (XGBoost), random forest and light gradient boosting machine (lightGBM), a soft voting ensemble model was proposed to measure the importance of 82 features. With this soft voting ensemble model and variance inflation factor (VIF), strong multicollinear features with low importance scores were further removed from candidate biomarkers. Then, a soft voting ensemble classifier was adopted to demonstrate the efficiency of the proposed feature selection method.
Results: With the novel feature selection method, 12 baseline variables and 3 metal variables were selected to detect patients at risk for T2DM in our study. For metal variables, the dietary copper (Cu), urinary cadmium (Cd) and urinary mercury (Hg) metals were selected as the most remarkable metal exposure and the corresponding P-values were all less than 0.05. In a classification model of T2DM with 12 baseline biomarkers, the addition of 3 metal exposure improved the classification accuracy of T2DM from a traditional area under the curve (AUC) 0.792 of the receiver operating characteristic (ROC) to an AUC 0.847.
Conclusions: This was the first demonstration of T2DM classification with machine learning under urinary and dietary metal exposure. Improved prediction precision illustrated the effectiveness of the proposed machine learning-based diagnosis model facilitated lifestyle/dietary intervention for T2DM prevention.
Keywords: Environmental metal exposure; Machine learning; Statistical analysis; Type 2 diabetes mellitus.
Copyright © 2023 Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Similar articles
-
Learning from the machine: is diabetes in adults predicted by lifestyle variables? A retrospective predictive modelling study of NHANES 2007-2018.BMJ Open. 2025 Mar 22;15(3):e096595. doi: 10.1136/bmjopen-2024-096595. BMJ Open. 2025. PMID: 40122552 Free PMC article.
-
Machine Learning Models Integrating Dietary Indicators Improve the Prediction of Progression from Prediabetes to Type 2 Diabetes Mellitus.Nutrients. 2025 Mar 8;17(6):947. doi: 10.3390/nu17060947. Nutrients. 2025. PMID: 40289953 Free PMC article.
-
Machine learning-based coronary heart disease diagnosis model for type 2 diabetes patients.Front Endocrinol (Lausanne). 2025 May 22;16:1550793. doi: 10.3389/fendo.2025.1550793. eCollection 2025. Front Endocrinol (Lausanne). 2025. PMID: 40475993 Free PMC article.
-
Accuracy of Machine Learning Classification Models for the Prediction of Type 2 Diabetes Mellitus: A Systematic Survey and Meta-Analysis Approach.Int J Environ Res Public Health. 2022 Nov 1;19(21):14280. doi: 10.3390/ijerph192114280. Int J Environ Res Public Health. 2022. PMID: 36361161 Free PMC article. Review.
-
Use and performance of machine learning models for type 2 diabetes prediction in community settings: A systematic review and meta-analysis.Int J Med Inform. 2020 Nov;143:104268. doi: 10.1016/j.ijmedinf.2020.104268. Epub 2020 Sep 7. Int J Med Inform. 2020. PMID: 32950874
Cited by
-
Comprehensive Analysis of the SUMO-related Signature: Implication for Diagnosis, Prognosis, and Immune Therapeutic Approaches in Cervical Cancer.Biochem Genet. 2024 Dec;62(6):4654-4678. doi: 10.1007/s10528-024-10728-2. Epub 2024 Feb 13. Biochem Genet. 2024. PMID: 38349439
-
Identifying diagnostic indicators for type 2 diabetes mellitus from physical examination using interpretable machine learning approach.Front Endocrinol (Lausanne). 2024 Mar 18;15:1376220. doi: 10.3389/fendo.2024.1376220. eCollection 2024. Front Endocrinol (Lausanne). 2024. PMID: 38562414 Free PMC article.
-
Machine learning model for age-related macular degeneration based on heavy metals: The National Health and Nutrition Examination Survey 2005 to 2008.Sci Rep. 2024 Nov 6;14(1):26913. doi: 10.1038/s41598-024-78412-4. Sci Rep. 2024. PMID: 39506000 Free PMC article.
-
Exploring the relationship between heavy metals and diabetic retinopathy: a machine learning modeling approach.Sci Rep. 2024 Jun 6;14(1):13049. doi: 10.1038/s41598-024-63916-w. Sci Rep. 2024. PMID: 38844504 Free PMC article.
-
Effects of Various Heavy Metal Exposures on Insulin Resistance in Non-diabetic Populations: Interpretability Analysis from Machine Learning Modeling Perspective.Biol Trace Elem Res. 2024 Dec;202(12):5438-5452. doi: 10.1007/s12011-024-04126-3. Epub 2024 Feb 26. Biol Trace Elem Res. 2024. PMID: 38409445
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical