Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 24;12(7):e0179805.
doi: 10.1371/journal.pone.0179805. eCollection 2017.

Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project

Affiliations

Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project

Manal Alghamdi et al. PLoS One. .

Abstract

Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors declare no conflict of interest.

Figures

Fig 1
Fig 1. ROC performance of classification models on imbalance dataset using G1.
Fig 2
Fig 2. ROC performance of classification models on imbalance dataset using G2.
Fig 3
Fig 3. Performance of classification models on balance dataset using Random Under-Sampling.
Fig 4
Fig 4. Performance of classification models on balance dataset using SMOTE.

References

    1. International Diabetes Federation, http://www.diabetesatlas.org.;.
    1. Rydén L, Standl E, Bartnik M, Van den Berghe G, Betteridge J, De Boer MJ, et al. Guidelines on diabetes, pre-diabetes, and cardiovascular diseases: full text. European Heart Journal Supplements. 2007;9(suppl C):C3–C74. 10.1093/eurheartj/ehl261 - DOI
    1. Juraschek SP, Blaha MJ, Blumenthal RS, Brawner C, Qureshi W, Keteyian SJ, et al. Cardiorespiratory fitness and incident diabetes: the FIT (Henry Ford ExercIse Testing) project. Diabetes Care. 2015;38(6):1075–1081. 10.2337/dc14-2714 - DOI - PubMed
    1. Habibi S, Ahmadi M, Alizadeh S. Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining. Global journal of health science. 2015;7(5):304 10.5539/gjhs.v7n5p304 - DOI - PMC - PubMed
    1. Zhu M, Li J, Li Z, Luo W, Dai D, Weaver SR, et al. Mortality rates and the causes of death related to diabetes mellitus in Shanghai Songjiang District: an 11-year retrospective analysis of death certificates. BMC endocrine disorders. 2015;15(1):45 10.1186/s12902-015-0042-1 - DOI - PMC - PubMed