Comparative Analysis of Feature Extraction Methods and Machine Learning Models for Predicting Osteoporosis Prevalence
- PMID: 40439990
- DOI: 10.1007/s10916-025-02203-1
Comparative Analysis of Feature Extraction Methods and Machine Learning Models for Predicting Osteoporosis Prevalence
Abstract
This study systematically examined the impact of three feature selection techniques (Boruta, Extreme gradient boosting (XGBoost), and Lasso) for optimizing four machine learning models (Random forest (RF), XGBoost, Logistic regression (LR), and Support vector machine (SVM)) in predicting bone density prevalence. Our findings revealed that varying data partitioning ratios (training and test sets: 0.6:0.4; 0.7:0.3; 0.8:0.2; 0.9:0.1) minimally impacted the prediction accuracy across all four models, a conclusion reinforced by 10-fold cross validation. Besides, principal component analysis (PCA) led to substantial accuracy degradation (0.6-0.8 range), suggesting incompatibility with this study's requirements due to the inherent complex decision boundaries in the original high-dimensional data. Comparative analysis demonstrated that the Boruta-XGBoost combination achieved superior performance (accuracy: 0.9083 ± 0.0146), significantly outperforming the Lasso-LR combination (0.7480 ± 0.0157) across all evaluation frameworks. Regarding model evaluation metrics, the RF model exhibited enhanced discriminative capacity with Area under the receiver operating characteristic (AUROC) values of 0.85, 0.81, and 0.80 under different feature selection approaches, surpassing the SVM model (0.78, 0.76, and 0.76). This advantage likely stems from RF's native capability to capture non-linear relationships and feature interactions.
Keywords: Feature extraction; Machine learning; Multi-model assessment; Prevalence of bone mineral density.
© 2025. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Conflict of interest statement
Declarations. Ethical approval: Not Applicable. Conflicts of interest or competing interests: No conflict of interest exists in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-author that the work described was original research that has not been published previously, and not under consideration for publication elsewhere in whole or in part. All the authors listed have approved the manuscript that is enclosed. Clinical Trial Number: Not applicable.
References
-
- P.P. Masurkar, MPharm, S. Rege, Marginal Health Care Expenditures and Health-Related Quality of Life Burden in Patients with Osteoporosis in United States, Journal of the American Pharmacists Association, 18 (2024) 102315.
-
- E.M. Lewiecki, J.P. Bilezikian, A. Clark, M.T. Collins, D.M. Kado, J. Lane, B. Langdahl, M.R. Mcclung, P.J. Snyder, E.M. Stein, Proceedings of the 2024 Santa Fe Bone Symposium: Update on the Management of Osteoporosis and Rare Bone Diseases, Journal of Clinical Densitometry, 28 (2024) 101559. - DOI - PubMed
-
- X.K. Cao, K.W. Rong, Y.H. Li, P. Zhang, K.X. Liu, L. Cui, S.T. Fu, Q. Hua, X. Yang, H. Zhang, X.F. Cheng, P.X. Ma, J. Zhao, A. Qin, A novel application perspective of the clinical-used drug verapamil on osteoporosis via targeting Txnip, Journal of Orthopaedic Translation, 50 (2025) 158-173.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Medical