Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study
- PMID: 40584706
- PMCID: PMC12202361
- DOI: 10.3389/fmed.2025.1587540
Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study
Abstract
Objective: This study aimed to investigate the feasibility of developing machine learning models for non-invasive prediction of Helicobacter pylori (H pylori) infection using routinely collected adult health screening data, including demographic characteristics and clinical biomarkers, to establish a potential decision-support tool for clinical practice.
Methods: The data was sourced from the adult health examination records within the health management centers of the hospital. The Least Absolute Shrinkage and Selection Operator (LASSO) regression was employed for feature selection. Six distinct machine learning algorithms were utilized to construct the predictive models, and their performance was comprehensively evaluated. Additionally, the SHapley Additive Projection (SHAP) method was adopted to visualize the model features and the prediction results of individual cases.
Results: A total of 10,393 subjects were included in the dataset, with 3,278 (31.54%) having H pylori infection. After feature screening, 10 factors were selected for the prediction model. Among six machine-learning models, the Extra Trees model had the best performance, with an AUC of 0.827, Accuracy of 0.744, and Recall of 0.736. The Random Forest model also did well, with an AUC of 0.810. XGBoost attained an AUC of 0.801, indicating moderate predictive capability. SHAP analysis showed that age, WBC, ALB, gender, and wasit were the top five factors affecting H pylori infection. Higher age, WBC, wasit and lower ALB were linked to a higher infection probability. These results offer insights into H pylori infection risk factors and model performance.
Conclusion: The Extra Trees classifier exhibited the optimal performance in predicting H pylori infections among the evaluated models. Additionally, the SHAP analysis enhanced the interpretability of the model, which offers valuable insights for early-stage clinical prediction and intervention strategies.
Keywords: H pylori infection; SHAP analysis; basic health examination; health examination; machine learning.
Copyright © 2025 Wang, Liang, Li, Zhou and Liu.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures









Similar articles
-
Construction and validation of HBV-ACLF bacterial infection diagnosis model based on machine learning.BMC Infect Dis. 2025 Jul 1;25(1):847. doi: 10.1186/s12879-025-11199-5. BMC Infect Dis. 2025. PMID: 40596896 Free PMC article.
-
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733. J Med Internet Res. 2025. PMID: 40418571 Free PMC article.
-
Non-invasive diagnostic tests for Helicobacter pylori infection.Cochrane Database Syst Rev. 2018 Mar 15;3(3):CD012080. doi: 10.1002/14651858.CD012080.pub2. Cochrane Database Syst Rev. 2018. PMID: 29543326 Free PMC article.
-
Sequential versus standard triple first-line therapy for Helicobacter pylori eradication.Cochrane Database Syst Rev. 2016 Jun 28;2016(6):CD009034. doi: 10.1002/14651858.CD009034.pub2. Cochrane Database Syst Rev. 2016. PMID: 27351542 Free PMC article.
-
Advanced Prediction of Heart Failure Risk in Elderly Diabetic and Hypertensive Patients Using Nine Machine Learning Models and Novel Composite Indices: Insights from NHANES 2003-2016.Eur J Prev Cardiol. 2025 Feb 27:zwaf081. doi: 10.1093/eurjpc/zwaf081. Online ahead of print. Eur J Prev Cardiol. 2025. PMID: 40036490
References
LinkOut - more resources
Full Text Sources
Miscellaneous