Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2025 Jul 1;25(1):240.
doi: 10.1186/s12911-025-03064-x.

Machine learning for detection of diffusion abnormalities-related respiratory changes among normal, overweight, and obese individuals based on BMI and pulmonary ventilation parameters: an observational study

Affiliations
Observational Study

Machine learning for detection of diffusion abnormalities-related respiratory changes among normal, overweight, and obese individuals based on BMI and pulmonary ventilation parameters: an observational study

Xin-Yue Song et al. BMC Med Inform Decis Mak. .

Abstract

Background: The integration of machine learning (ML) algorithms enables the detection of diffusion abnormalities-related respiratory changes in individuals with normal body mass index (BMI), overweight, and obesity based on BMI and pulmonary ventilation parameters. We evaluated the effectiveness of various supervised ML algorithms and identified the optimal configurations for these applications.

Methods: We conducted a retrospective analysis of data from 440 individuals who underwent pulmonary function tests between January 1, 2021, and April 1, 2024. This cohort consisted of 287 individuals with normal diffusion capacity (DN) and 153 with diffusion abnormalities (DA). We employed statistical comparisons (e.g., independent samples t-test and Chi-square test) to analyze demographic characteristics and spirometry results. Piecewise regression evaluated the correlation between BMI and carbon monoxide diffusing capacity (DLCO). Pulmonary ventilation parameters included forced vital capacity (FVC), forced expiratory volume in one second (FEV1), FEV1/FVC, peak expiratory flow (PEF), maximum mid-expiratory flow (MMEF) and vital capacity (VC). We applied several supervised ML algorithms and feature selection strategies to distinguish between DN and DA, including Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), Naive Bayes (BAYES), K-Nearest Neighbors (KNN), SelectKBest, Recursive Feature Elimination with Cross-Validation (RFECV), and SelectFromModel. Additionally, we performed feature importance analysis using shapley additive explanations (SHAP) and permutation importance to evaluate the contribution of individual parameters to the classification process.

Results: Our findings revealed that individuals in the DA group demonstrated lower PEF and DLCO than their DN counterparts. BMI displayed a cubic relationship with DLCO for 18.5 kg/m² < BMI < 40 kg/m² (R² = 0.498, P < 0.01), and a linear negative correlation for BMI ≥ 40 kg/m² (r = -0.253, P < 0.05). Notably, the RF algorithm emerged as the most effective diagnostic tool for distinguishing between DN and DA, achieving an area under the curve (AUC) of 0.983, considerably outpacing other algorithms like BAYES, SVM, AdaBoost, and KNN (P < 0.01). Applying various feature selection strategies identified optimal parameters (BMI, FEV1/FVC, and VC) in subsequent experiments, which aligned with the results from feature importance analysis and pulmonary physiology. While feature selection enhanced KNN's diagnostic accuracy, it had a minimal impact on BAYES's performance.

Conclusion: The results indicate that for individuals with a BMI between 18.5 kg/m² and 40 kg/m², diffusion capacity improves with increasing BMI. Conversely, diffusion capacity decreases for those with a BMI of 40 kg/m² or higher. This study underscores the potential of combining BMI and pulmonary ventilation parameters with ML algorithms as a practical approach to diagnosing diffusion abnormalities across normal-weight, overweight, and obese categories, particularly in contexts utilizing portable spirometers.

Trial registration: Not applicable.

Keywords: Body mass index; Diffusing capacity; Machine learning.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This study conformed to the Declaration of Helsinki and was approved by the Ethics Committee of West China Hospital, Sichuan University, China. The study received ethical approval with a waiver of patient informed consent. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Association between BMI (kg/m²) and DLco (% pred) in the study population. The scatter plot illustrates the relationship between BMI and DLco, with a segmental regression curve (solid line). The coefficient of determination (R² = 0.498) is displayed for BMI between 18.5 kg/m² and 40 kg/m²(left), and the correlation coefficient (r = -0.253) is shown for BMI ≥ 40 kg/m²(right)
Fig. 2
Fig. 2
Results of experiment 2, describing the diagnostic accuracy of BMI and pulmonary ventilation parameters among subjects with normal BMI, overweight, and obesity through five ML methods. An asterisk (*) denotes a statistically significant difference when compared to RF (P < 0.05). Significance levels are noted as follows: * P < 0.05 and ** P < 0.01. More detailed graphs related to these results can be found in the Additional files (see Figure S1)
Fig. 3
Fig. 3
Summary of Experiments 2 and 3 (using SelectKBest as the feature selector). The figure displays the best ML algorithm for each case. An asterisk (*) indicates a statistically significant difference compared to RF in Experiment 2 (P < 0.05). Significance levels are noted as follows: * P < 0.05 and ** P < 0.01. More detailed graphs related to these results can be found in the Additional files (Figures S2-S4)
Fig. 4
Fig. 4
Summary of Experiments 2 to 5, highlighting the best ML algorithms from Experiments 4 and 5. The figure illustrates the top ML algorithm for each experiment. An asterisk (*) indicates a statistically significant difference compared to RF in Experiment 2 (P < 0.05). Significance levels are noted as follows: * P < 0.05 and ** P < 0.01. More detailed graphs related to these results can be found in the Additional files (Figures S5-S6)
Fig. 5
Fig. 5
Interpretability analysis of SHAP values and Permutation Importance of Features of RF model in Experiment 6. (A) Importance chart of SHAP variables, each dot represents a sample, with the x-axis showing the SHAP value. Feature values are color-coded: blue represents lower values, while red represents higher values. (B) Permutation importance of Features, with the included features sorted by the significance from highest to lowest

Similar articles

References

    1. Apovian CM. Obesity: definition, comorbidities, causes, and burden. Am J Manag Care. 2016;22(7 Suppl):s176–85. https://www.ajmc.com/view/obesity-definition-comorbidities-causes-burden. - PubMed
    1. McLaughlin T, Craig C, Liu LF, et al. Adipose cell size and regional fat deposition as predictors of metabolic response to overfeeding in Insulin-Resistant and Insulin-Sensitive humans. Diabetes. 2016;65(5):1245–54. 10.2337/db15-1213. - PMC - PubMed
    1. Porro S, Genchi VA, Cignarelli A, et al. Dysmetabolic adipose tissue in obesity: morphological and functional characteristics of adipose stem cells and mature adipocytes in healthy and unhealthy obese subjects. J Endocrinol Invest. 2021;44(5):921–41. 10.1007/s40618-020-01446-84. - PubMed
    1. Jones RL, Nzekwu MM. The effects of body mass index on lung volumes. Chest. 2006;130(3):827–33. 10.1378/chest.130.3.827. - PubMed
    1. Hsu YE, Chen SC, Geng JH, et al. Obesity-related indices are associated with longitudinal changes in lung function: a large Taiwanese population follow-up study. Nutrients. 2021;13(11):4055. 10.3390/nu13114055. - PMC - PubMed

Publication types

LinkOut - more resources