Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 13:12:1587540.
doi: 10.3389/fmed.2025.1587540. eCollection 2025.

Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study

Affiliations

Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study

Qiaoli Wang et al. Front Med (Lausanne). .

Abstract

Objective: This study aimed to investigate the feasibility of developing machine learning models for non-invasive prediction of Helicobacter pylori (H pylori) infection using routinely collected adult health screening data, including demographic characteristics and clinical biomarkers, to establish a potential decision-support tool for clinical practice.

Methods: The data was sourced from the adult health examination records within the health management centers of the hospital. The Least Absolute Shrinkage and Selection Operator (LASSO) regression was employed for feature selection. Six distinct machine learning algorithms were utilized to construct the predictive models, and their performance was comprehensively evaluated. Additionally, the SHapley Additive Projection (SHAP) method was adopted to visualize the model features and the prediction results of individual cases.

Results: A total of 10,393 subjects were included in the dataset, with 3,278 (31.54%) having H pylori infection. After feature screening, 10 factors were selected for the prediction model. Among six machine-learning models, the Extra Trees model had the best performance, with an AUC of 0.827, Accuracy of 0.744, and Recall of 0.736. The Random Forest model also did well, with an AUC of 0.810. XGBoost attained an AUC of 0.801, indicating moderate predictive capability. SHAP analysis showed that age, WBC, ALB, gender, and wasit were the top five factors affecting H pylori infection. Higher age, WBC, wasit and lower ALB were linked to a higher infection probability. These results offer insights into H pylori infection risk factors and model performance.

Conclusion: The Extra Trees classifier exhibited the optimal performance in predicting H pylori infections among the evaluated models. Additionally, the SHAP analysis enhanced the interpretability of the model, which offers valuable insights for early-stage clinical prediction and intervention strategies.

Keywords: H pylori infection; SHAP analysis; basic health examination; health examination; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
LASSO coefficients of all 27 features.
FIGURE 2
FIGURE 2
Plots of absolute values of lasso coefficients for the remaining 10 features after feature selection.
FIGURE 3
FIGURE 3
Receiver operating characteristic (ROC) curve of the predictive model.
FIGURE 4
FIGURE 4
Calibration curves of all models.
FIGURE 5
FIGURE 5
Decision curve analysis (DCA).
FIGURE 6
FIGURE 6
Extra trees model SHAP feature importance.
FIGURE 7
FIGURE 7
Radar plot for the top 5 predictors importance of H pylori infection.
FIGURE 8
FIGURE 8
The SHAP summary plot.
FIGURE 9
FIGURE 9
Effect of age on helicobacter pylori infection probability.

Similar articles

References

    1. Chen Y, Malfertheiner P, Yu H, Kuo C, Chang Y, Meng F, et al. Global prevalence of Helicobacter pylori infection and incidence of gastric cancer between 1980 and 2022. Gastroenterology. (2024) 166:605–19. 10.1053/j.gastro.2023.12.0222 - DOI - PubMed
    1. Xie L, Liu G, Liu Y, Li P, Hu X, He X, et al. Prevalence of Helicobacter pylori infection in China from 2014-2023: A systematic review and meta-analysis. World J Gastroenterol. (2024) 30:4636–56. 10.3748/wjg.v30.i43.4636 - DOI - PMC - PubMed
    1. Sousa C, Ferreira R, Santos S, Azevedo N, Melo L. Advances on diagnosis of Helicobacter pylori infections. Crit Rev Microbiol. (2023) 49:671–92. 10.1080/1040841X.2022.2125287 - DOI - PubMed
    1. International Agency for Research on Cancer [IARC]. Schistosomes, liver flukes and Helicobacter pylori. IARC monographs on the evaluation of carcinogenic risks to humans. (Vol. 61). Lyon: IARC; (1994). p. 1–241. - PMC - PubMed
    1. Mentis A, Boziki M, Grigoriadis N, Papavassiliou A. Helicobacter pylori infection and gastric cancer biology: Tempering a double-edged sword. Cell Mol Life Sci. (2019) 76:2477–86. 10.1007/s00018-019-03044-1 - DOI - PMC - PubMed

LinkOut - more resources