Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 11:5:e56993.
doi: 10.2196/56993.

Machine Learning-Based Hyperglycemia Prediction: Enhancing Risk Assessment in a Cohort of Undiagnosed Individuals

Affiliations

Machine Learning-Based Hyperglycemia Prediction: Enhancing Risk Assessment in a Cohort of Undiagnosed Individuals

Kolapo Oyebola et al. JMIRx Med. .

Abstract

Background: Noncommunicable diseases continue to pose a substantial health challenge globally, with hyperglycemia serving as a prominent indicator of diabetes.

Objective: This study employed machine learning algorithms to predict hyperglycemia in a cohort of individuals who were asymptomatic and unraveled crucial predictors contributing to early risk identification.

Methods: This dataset included an extensive array of clinical and demographic data obtained from 195 adults who were asymptomatic and residing in a suburban community in Nigeria. The study conducted a thorough comparison of multiple machine learning algorithms to ascertain the most effective model for predicting hyperglycemia. Moreover, we explored feature importance to pinpoint correlates of high blood glucose levels within the cohort.

Results: Elevated blood pressure and prehypertension were recorded in 8 (4.1%) and 18 (9.2%) of the 195 participants, respectively. A total of 41 (21%) participants presented with hypertension, of which 34 (83%) were female. However, sex adjustment showed that 34 of 118 (28.8%) female participants and 7 of 77 (9%) male participants had hypertension. Age-based analysis revealed an inverse relationship between normotension and age (r=-0.88; P=.02). Conversely, hypertension increased with age (r=0.53; P=.27), peaking between 50-59 years. Of the 195 participants, isolated systolic hypertension and isolated diastolic hypertension were recorded in 16 (8.2%) and 15 (7.7%) participants, respectively, with female participants recording a higher prevalence of isolated systolic hypertension (11/16, 69%) and male participants reporting a higher prevalence of isolated diastolic hypertension (11/15, 73%). Following class rebalancing, the random forest classifier gave the best performance (accuracy score 0.89; receiver operating characteristic-area under the curve score 0.89; F1-score 0.89) of the 26 model classifiers. The feature selection model identified uric acid and age as important variables associated with hyperglycemia.

Conclusions: The random forest classifier identified significant clinical correlates associated with hyperglycemia, offering valuable insights for the early detection of diabetes and informing the design and deployment of therapeutic interventions. However, to achieve a more comprehensive understanding of each feature's contribution to blood glucose levels, modeling additional relevant clinical features in larger datasets could be beneficial.

Keywords: diabetes; hyperglycemia; hypertension; machine learning; random forest.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1.
Figure 1.. Pipeline for model development. SMOTE: synthetic minority oversampling technique.
Figure 2.
Figure 2.. Participant recruitment and screening.
Figure 3.
Figure 3.. Age-based analysis of BP. Percentage of participants with normal BP reduced with increases in age (r=−0.88; P=.02). Prevalence of HTN increased with age (r=0.53; P=.27), peaking between 50-59 years. BP: blood pressure; HTN: hypertension.
Figure 4.
Figure 4.. Age-based analysis of ISH and IDH. ISH increased with participants’ age (r=0.86; P=.03), unlike IDH (r=−0.71; P=.11). IDH: isolated diastolic hypertension; ISH: isolated systolic hypertension.
Figure 5.
Figure 5.. Age-based ECG analysis. Age-dependent increase in the percentage of participants with abnormal ECG values peaking between ages 60-69 years. ECG: electrocardiogram.
Figure 6.
Figure 6.. Correlation matrix of independent variables with the outcome variable. BP: blood pressure; ECG: electrocardiogram; HTN: hypertension.
Figure 7.
Figure 7.. Accuracy scores of machine learning classifiers (A) before class rebalancing with the synthetic minority oversampling technique and (B) after class rebalancing with the synthetic minority oversampling technique. CV: cross-validation; LGBM: light gradient boosting machine; NB: naive Bayes; SGD: stochastic gradient descent; SVC: support vector classification; XGB: extreme gradient boosting.
Figure 8.
Figure 8.. Random forest confusion matrix showing a visual representation of the true vs predicted labels. True positive: the values that were positive and were predicted positive, that is, 31 cases of hyperglycemia were predicted correctly by the model. False positive: the values that were negative but falsely predicted as positive. In this case, only 3 cases were false positives. False negative: the values that were positive but falsely predicted as negative. In this instance, there were 4 false negatives. True negative: the values that were negative and were predicted negative. Here, 28 cases were detected. In all, the weighted average of the accuracy score and F1-score were 0.89 and 0.89, respectively. Precision is a metric that quantifies the accuracy of a classifier by determining the number of correctly identified members of a class divided by all instances where the model predicted that specific class. In the context of hyperglycemia prediction, precision would be the count of accurate predictions of hyperglycemia divided by the total instances where the classifier predicted “hyperglycemia,” regardless of correctness. Recall, on the other hand, measures the effectiveness of a classifier in correctly identifying members of a class by dividing the number of correctly identified instances by the total number of actual members in that class. In the hyperglycemia scenario, recall would represent the number of actual participants with hyperglycemia correctly identified by the classifier. The F1-score is a composite metric that combines both precision and recall into a single value. It provides a concise evaluation of a classifier’s performance. A high F1-score indicates that both precision and recall are high, while a low F1-score suggests that one or both metrics are low. This metric is particularly useful for quickly assessing whether a classifier effectively identifies members of a class or if it resorts to shortcuts, such as indiscriminately classifying everything as a member of a larger class. avg: average.

Update of

  • doi: 10.1101/2023.11.22.23298939
  • doi: 10.2196/56693

Similar articles

References

    1. Bigna JJ, Noubiap JJ. The rising burden of non-communicable diseases in sub-Saharan Africa. Lancet Glob Health. 2019 Oct;7(10):e1295–e1296. doi: 10.1016/S2214-109X(19)30370-5. doi. Medline. - DOI - PubMed
    1. Cross SH, Mehra MR, Bhatt DL, et al. Rural-urban differences in cardiovascular mortality in the US, 1999-2017. JAMA. 2020 May 12;323(18):1852–1854. doi: 10.1001/jama.2020.2047. doi. Medline. - DOI - PMC - PubMed
    1. Turecamo SE, Xu M, Dixon D, et al. Association of rurality with risk of heart failure. JAMA Cardiol. 2023 Mar 1;8(3):231–239. doi: 10.1001/jamacardio.2022.5211. doi. Medline. - DOI - PMC - PubMed
    1. Khayat S, Dolatian M, Navidian A, Mahmoodi Z, Sharifi N, Kasaeian A. Lifestyles in suburban populations: a systematic review. Electron Physician. 2017 Jul 25;9(7):4791–4800. doi: 10.19082/4791. doi. Medline. - DOI - PMC - PubMed
    1. Kolié D, Van De Pas R, Codjia L, Zurn P. Increasing the availability of health workers in rural sub-Saharan Africa: a scoping review of rural pipeline programmes. Hum Resour Health. 2023 Mar 14;21(1):20. doi: 10.1186/s12960-023-00801-z. doi. Medline. - DOI - PMC - PubMed

LinkOut - more resources