Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul-Sep;108(3):368504251366850.
doi: 10.1177/00368504251366850. Epub 2025 Aug 6.

Enhancing body fat prediction with WGAN-GP data augmentation and XGBoost algorithm

Affiliations

Enhancing body fat prediction with WGAN-GP data augmentation and XGBoost algorithm

Xiangyu Wang et al. Sci Prog. 2025 Jul-Sep.

Abstract

Background and ObjectiveMachine learning models offer a practical approach for estimating body fat percentage from simple anthropometric data. However, the scarcity of biomedical data frequently leads to model overfitting, compromising predictive accuracy. Generative data augmentation presents a promising strategy to address this limitation. This study develops and evaluates a generative data augmentation framework to enhance body fat prediction from limited anthropometric data.MethodsA public dataset comprising 249 male subjects was partitioned into development (80%) and test (20%) sets. The fidelity of Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), random noise injection, and mixup was compared to select the optimal method. Subsequently, XGBoost, Support Vector Regression, and Multi-layer Perceptron models were trained and validated, comparing performance with and without the selected augmentation. Final model generalization was assessed on the independent test set using the coefficient of determination (R²), Mean Absolute Error, and Root Mean Squared Error.ResultsAmong the evaluated augmentation techniques, the WGAN-GP generated synthetic data with the highest fidelity. On the original data, the baseline XGBoost model achieved a R² of 0.67; this performance increased to 0.77 on the test set when using WGAN-GP augmentation. Feature importance analysis of the final model identified abdominal circumference as the most significant predictor of body fat percentage.ConclusionThe WGAN-GP is a highly effective method for generating realistic synthetic anthropometric data. Integrating these synthetic samples into the training pipeline substantially improves the generalization and predictive accuracy of machine learning models. This methodology offers a robust solution for developing more accurate and accessible predictive health models in data-scarce environments.

Keywords: Body fat percentage; XGBoost; anthropometry; data augmentation; generative adversarial network.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interestsThe authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Overall study workflow diagram.
Figure 2.
Figure 2.
Detailed predictive model development and validation pipeline.
Figure 3.
Figure 3.
KDE plots of selected feature distributions. Original (red line), Mixup (green line), NoiseInjection (blue line), and WGAN-GP (purple line). Each subplot represents one feature, with values on the x-axis and probability density on the y-axis.
Figure 4.
Figure 4.
Discriminator-based evaluation of data augmentation realism. ROC curves for LightGBM classifiers trained to distinguish between original data and data generated by three different augmentation techniques. The AUC score quantifies the distinguishability of each augmented dataset. A lower AUC score indicates that the synthetic data is more realistic and harder for the classifier to differentiate from the original data, signifying a higher-fidelity augmentation method. The dashed line represents the performance of a random-guess classifier (AUC = 0.5).
Figure 5.
Figure 5.
Distribution of original data.
Figure 6.
Figure 6.
Scatter plots of predicted versus actual BodyFat values for the baseline models on the original development set. The dashed line in each plot represents a perfect prediction (y = x). Subplots display the performance for (a) the XGBoost model, (b) the SVR model, and (c) the MLP model.
Figure 7.
Figure 7.
Scatter plots of predicted versus actual values for the final models on the hold-out test set, trained on WGAN-GP augmented data. The dashed line represents a perfect prediction (y = x). Subplots show the performance for (a) the XGBoost model, (b) the SVR model, and (c) the MLP model.
Figure 8.
Figure 8.
Top 13 feature importances for the final XGBoost model trained with WGAN-GP augmentation, calculated using the Gain metric.

Similar articles

References

    1. Sinaga M, Teshome MS, Yemane T, et al. Ethnic specific body fat percent prediction equation as surrogate marker of obesity in Ethiopian adults. J Heatlh Popul Nutr 2021; 40: 8. - PMC - PubMed
    1. Torgutalp SS, Korkusuz F. Abdominal subcutaneous fat thickness measured by ultrasound as a predictor of total fat mass in young- and middle-aged adults. Acta Endocrinol 2022; 18: 58–63. - PMC - PubMed
    1. Al-Ati T, Wells J, Ward LC. Prediction of fat-free mass and fat mass from bioimpedance spectroscopy and anthropometry: a validation study in 7-to 9-year-old Kuwaiti children. Public Health Nutr 2025; 28: 11. - PMC - PubMed
    1. Schulleri KH, Johannsen L, Michel Yet al. et al. Sex differences in the association of postural control with indirect measures of body representations. Sci Rep 2022; 12: 16. - PMC - PubMed
    1. Kryst L, Zeglen M, Kowal Met al. et al. Body fat percentage estimation in children - searching for the most accurate equation. Homo 2021; 72: 205–213. - PubMed

LinkOut - more resources