Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 5;14(1):23234.
doi: 10.1038/s41598-024-73799-6.

The risk factors determined by four machine learning methods for the change of difference of bone mineral density in post-menopausal women after three years follow-up

Affiliations

The risk factors determined by four machine learning methods for the change of difference of bone mineral density in post-menopausal women after three years follow-up

Ching-Yao Chang et al. Sci Rep. .

Abstract

The prevalence of osteoporosis has drastically increased recently. It is not only the most frequent but is also a major global public health problem due to its high morbidity. There are many risk factors associated with osteoporosis were identified. However, most studies have used the traditional multiple linear regression (MLR) to explore their relationships. Recently, machine learning (Mach-L) has become a new modality for data analysis because it enables machine to learn from past data or experiences without being explicitly programmed and could capture nonlinear relationships better. These methods have the potential to outperform conventional MLR in disease prediction. In the present study, we enrolled a Chinese post-menopause cohort followed up for 4 years. The difference of T-score (δ-T score) was the dependent variable. Information such as demographic, biochemistry and life styles were the independent variables. Our goals were: (1) Compare the prediction accuracy between Mach-L and traditional MLR for δ-T score. (2) Rank the importance of risk factors (independent variables) for prediction of δ T-score. Totally, there were 1698 postmenopausal women were enrolled from MJ Health Database. Four different Mach-L methods namely, Random forest (RF), eXtreme Gradient Boosting (XGBoost), Naïve Bayes (NB), and stochastic gradient boosting (SGB), to construct predictive models for predicting δ-BMD after four years follow-up. The dataset was then randomly divided into an 80% training dataset for model building and a 20% testing dataset for model testing. A 10-fold cross-validation technique for hyperparameter tuning was used. The model with the lowest root mean square error for the validation dataset was viewed as the best model for each ML method. The averaged metrics of the RF, SGB, NB, and XGBoost models were used to compare the model performance of the benchmark MLR model that used the same training and testing dataset as the Mach-L methods. We defined that the priority demonstrated in each model ranked 1 as the most critical risk factor and 22 as the last selected risk factor. For Pearson correlation, age, education, BMI, HDL-C, and TSH were positively and plasma calcium level, and baseline T-score were negatively correlated with δ-T score. All four Mach-L methods yielded lower prediction errors than the MLR method and were all convincing Mach-L models. From our results, it could be noted that education level is the most important factor for δ-T Score, followed by DBP, smoking, SBP, UA, age, and LDL-C. All four Mach-L outperformed traditional MLR. By using Mach-L, the most important six risk factors were selected which are, from the most important to the least: DBP, SBP, UA, education level, TG and sleeping hour. δ T score was positively related to SBP, education level, UA and TG and negatively related to DBP and sleeping hour in postmenopausal Chinese women.

Keywords: Longitudinal study; Machine learning; Osteoporosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Patient selection scheme.
Fig. 2
Fig. 2
Proposed machine learning prediction scheme.
Fig. 3
Fig. 3
The ranks of the risk factors derived from three different machine learning methods.
Fig. 4
Fig. 4
Visualize the impact of input features on output using SHAP values ​​from models based on Random Forest (RF), Extreme Gradient Boosting (XGBoost), Naive Bayes (NB), and Stochastic Gradient Boosting (SGB).

References

    1. Black, D. M. & Rosen, C. J. Clinical practice. Postmenopausal osteoporosis. N. Engl. J. Med.374(3), 254–262 (2016). - PubMed
    1. Altkorn, D. & Cifu, A. S. Screening for osteoporosis. JAMA. 313(14), 1467–1468 (2015). - PubMed
    1. Bagger, Y. Z. et al. Links between cardiovascular disease and osteoporosis in postmenopausal women: serum lipids or atherosclerosis per se? Osteoporos. Int.18(4), 505–512 (2007). - PMC - PubMed
    1. Elefteriou, F. et al. Leptin regulation of bone resorption by the sympathetic nervous system and CART. Nature. 434(7032), 514–520 (2005). - PubMed
    1. Yamaguchi, T. [Bone metabolism in dyslipidemia and metabolic syndrome]. Clin. Calcium. 21(5), 677–682 (2011). - PubMed