Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 11;25(1):18.
doi: 10.1186/s12859-024-05633-9.

The effect of data balancing approaches on the prediction of metabolic syndrome using non-invasive parameters based on random forest

Affiliations

The effect of data balancing approaches on the prediction of metabolic syndrome using non-invasive parameters based on random forest

Sahar Mohseni-Takalloo et al. BMC Bioinformatics. .

Abstract

Background: Metabolic syndrome (MetS) is a cluster of metabolic abnormalities (including obesity, insulin resistance, hypertension, and dyslipidemia), which can be used to identify at-risk populations for diabetes and cardiovascular diseases, the main causes of morbidity and mortality worldwide. The achievement of a simple approach for diagnosing MetS without needing biochemical tests is so valuable. The present study aimed to predict MetS using non-invasive features based on a successful random forest learning algorithm. Also, to deal with the problem of data imbalance that naturally exists in this type of data, the effect of two different data balancing approaches, including the Synthetic Minority Over-sampling Technique (SMOTE) and Random Splitting data balancing (SplitBal), on model performance is investigated.

Results: The most important determinant for MetS prediction was waist circumference. Applying a random forest learning algorithm to imbalanced data, the trained models reach 86.9% and 79.4% accuracies and 37.1% and 38.2% sensitivities in men and women, respectively. However, by applying the SplitBal data balancing technique, the best results were obtained, and despite that the accuracy of the trained models decreased by 7.8% and 11.3%, but their sensitivity improved significantly to 82.3% and 73.7% in men and women, respectively.

Conclusions: The random forest learning method, along with data balancing techniques, especially SplitBal, could create MetS prediction models with promising results that can be applied as a useful prognostic tool in health screening programs.

Keywords: Data balancing; Machine learning; Metabolic syndrome; Random forest; SMOTE; SplitBal.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart of data processing for metabolic syndrome classification with RF. RF: Random Forest; SMOTE: Synthetic Minority Oversampling Technique; SplitBal: Random Splitting data balancing
Fig. 2
Fig. 2
MetS prediction ROC curves based on different data balancing methods by sex. A ROC curve in men using imbalance data; B ROC curve in women using imbalance data; C ROC curve in men using the SMOTE method; D ROC curve in women using the SMOTE method; E ROC curve in men using the SplitBal method; F ROC curve in women using the SplitBal method
Fig. 3
Fig. 3
Feature importance in the MetS prediction model based on different data balancing methods. A Features importance on imbalanced data in men; B features importance on imbalanced data in women; C features importance based on the SMOTE method in men; D features importance based on the SMOTE method in women; E features importance based on the SplitBal method in men; F features importance based on the SplitBal method in women. BMI: body mass index, SBP: systolic blood pressure, DBP: diastolic blood pressure, FH1: family history in a first-degree relative

References

    1. Saklayen MG. The global epidemic of the metabolic syndrome. Curr Hypertens Rep. 2018;20(2):1–8. doi: 10.1007/s11906-018-0812-z. - DOI - PMC - PubMed
    1. Ricci G, Pirillo I, Tomassoni D, Sirignano A, Grappasonni I. Metabolic syndrome, hypertension, and nervous system injury: epidemiological correlates. Clin Exp Hypertens. 2017;39(1):8–16. doi: 10.1080/10641963.2016.1210629. - DOI - PubMed
    1. Vrbaski D, Vrbaski M, Kupusinac A, Ivanovic D, Stokic E, Ivetic D, Doroslovacki K. Methods for algorithmic diagnosis of metabolic syndrome. Artif Intell Med. 2019;101:101708. doi: 10.1016/j.artmed.2019.101708. - DOI - PubMed
    1. Dolley S. Big data’s role in precision public health. Front Public Health. 2018:68. - PMC - PubMed
    1. Kim J, Mun S, Lee S, Jeong K, Baek Y. Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea. BMC Public Health. 2022;22(1):1–10. doi: 10.1186/s12889-022-13131-x. - DOI - PMC - PubMed