Enhancing body fat prediction with WGAN-GP data augmentation and XGBoost algorithm
- PMID: 40770941
- PMCID: PMC12332371
- DOI: 10.1177/00368504251366850
Enhancing body fat prediction with WGAN-GP data augmentation and XGBoost algorithm
Abstract
Background and ObjectiveMachine learning models offer a practical approach for estimating body fat percentage from simple anthropometric data. However, the scarcity of biomedical data frequently leads to model overfitting, compromising predictive accuracy. Generative data augmentation presents a promising strategy to address this limitation. This study develops and evaluates a generative data augmentation framework to enhance body fat prediction from limited anthropometric data.MethodsA public dataset comprising 249 male subjects was partitioned into development (80%) and test (20%) sets. The fidelity of Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), random noise injection, and mixup was compared to select the optimal method. Subsequently, XGBoost, Support Vector Regression, and Multi-layer Perceptron models were trained and validated, comparing performance with and without the selected augmentation. Final model generalization was assessed on the independent test set using the coefficient of determination (R²), Mean Absolute Error, and Root Mean Squared Error.ResultsAmong the evaluated augmentation techniques, the WGAN-GP generated synthetic data with the highest fidelity. On the original data, the baseline XGBoost model achieved a R² of 0.67; this performance increased to 0.77 on the test set when using WGAN-GP augmentation. Feature importance analysis of the final model identified abdominal circumference as the most significant predictor of body fat percentage.ConclusionThe WGAN-GP is a highly effective method for generating realistic synthetic anthropometric data. Integrating these synthetic samples into the training pipeline substantially improves the generalization and predictive accuracy of machine learning models. This methodology offers a robust solution for developing more accurate and accessible predictive health models in data-scarce environments.
Keywords: Body fat percentage; XGBoost; anthropometry; data augmentation; generative adversarial network.
Conflict of interest statement
Declaration of conflicting interestsThe authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Figures








Similar articles
-
A holistic framework for intradialytic hypotension prediction using generative adversarial networks-based data balancing.BMC Med Inform Decis Mak. 2025 Jul 10;25(1):257. doi: 10.1186/s12911-025-03094-5. BMC Med Inform Decis Mak. 2025. PMID: 40635002 Free PMC article.
-
A novel ensemble Wasserstein GAN framework for effective anomaly detection in industrial internet of things environments.Sci Rep. 2025 Jul 23;15(1):26786. doi: 10.1038/s41598-025-07533-1. Sci Rep. 2025. PMID: 40701989 Free PMC article.
-
Enhancing buckwheat maturity classification with generative adversarial networks for spectroscopy data augmentation.Front Plant Sci. 2025 Jul 8;16:1604088. doi: 10.3389/fpls.2025.1604088. eCollection 2025. Front Plant Sci. 2025. PMID: 40697874 Free PMC article.
-
Artificial intelligence for diagnosing exudative age-related macular degeneration.Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2. Cochrane Database Syst Rev. 2024. PMID: 39417312
-
Approaches for predicting dairy cattle methane emissions: from traditional methods to machine learning.J Anim Sci. 2024 Jan 3;102:skae219. doi: 10.1093/jas/skae219. J Anim Sci. 2024. PMID: 39123286 Free PMC article.
References
-
- Kryst L, Zeglen M, Kowal Met al. et al. Body fat percentage estimation in children - searching for the most accurate equation. Homo 2021; 72: 205–213. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources