Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 14;16(1):6472.
doi: 10.1038/s41467-025-61329-5.

The Helicobacter pylori AI-clinician harnesses artificial intelligence to personalise H. pylori treatment recommendations

Collaborators, Affiliations

The Helicobacter pylori AI-clinician harnesses artificial intelligence to personalise H. pylori treatment recommendations

Kyle Higgins et al. Nat Commun. .

Abstract

Helicobacter pylori (H. pylori) is the most common carcinogenic pathogen globally and the leading cause of gastric cancer. Here, we develop a reinforcement learning-based AI Clinician system to personalise treatment selection and evaluate its ability to improve eradication success compared to clinician-prescribed therapies. The model is trained and internally validated on 38,049 patients from the retrospective European Registry on Helicobacter pylori Management (Hp-EuReg), using independent state deep Q-learning (isDQN) to recommend optimal therapies based on patient characteristics such as age, sex, antibiotic allergies, country, and pre-treatment indication. In internal validation using real-world Hp-EuReg data, AI-recommended therapies achieve a 94.1% success rate (95% CI: 93.2-95.0%) versus 88.1% (95% CI: 87.7-88.4%) for clinician-prescribed therapies not aligned with AI suggestions-an improvement of 6.0%. Results are replicated in an external validation cohort (n = 7186), confirming generalisability. The AI system identifies optimal treatment strategies in key subgroups: 65% (n = 24,923) are recommended bismuth-based therapies, and 15% (n = 5898) non-bismuth quadruple therapies. Random forest modelling identifies region and concurrent medications as patient-specific drivers of AI recommendations. With nearly half the global population likely to contract H. pylori, this approach lays the foundation for future prospective clinical validation and shows the potential of AI to support clinical decision-making, enhance outcomes, and reduce gastric cancer burden.

PubMed Disclaimer

Conflict of interest statement

Competing interests: Javier P. Gisbert has served as speaker, consultant, and advisory member for or has received research funding from Mayoly Spindler, Allergan, Diasorin, Richen, Biocodex and Juvisé. Olga P. Nyssen received research funding from Allergan, Mayoly Spindler, Richen, Biocodex and Juvisé. Drs Kirill Veselkov, Ivan Laponogov, and Dennis Veselkov are affiliated with Intelligify Ltd, an AI consultancy company, which was not involved in the research, analysis, or interpretation of the results presented in this study. Tania Fleitas Kanonnikoff discloses advisory roles honoraria from Amgen, AstraZeneca, Beigene, BMS and MSD. Institutional research funding from Gilead. Speaker honoraria from Amgen, Servier, BMS, MSD, Lilly, Roche, Bayer. The remaining authors declare no conflicts of interest. POLICY DISCLOSURE-USE OF CLINICAL DATA. This study involves the secondary analysis of de-identified clinical data obtained from the European Registry on Helicobacter pylori Management (Hp-EuReg). The data were originally collected by the Hp-EuReg consortium across multiple centres in Europe under appropriate ethical approvals and patient consent at the time of collection. No new data were collected for the purposes of this analysis, and the authors were not involved in direct recruitment or interaction with study participants. All analyses were conducted on anonymised data in accordance with applicable data protection and ethical guidelines.

Figures

Fig. 1
Fig. 1. Hp-EuReg dataset and AI clinician overview.
a Helicobacter pylori (H. pylori) infects the stomach of around one in two individuals worldwide. It does so by infiltrating the gastric mucosa, aided by a highly motile flagellum. The infection is characterized by an overall increase in the acidity of the gastric fluid, onset by urease production, and increased inflammation in the gastric epithelia, often spanning decades before diagnosis. b H. pylori-induced pathology most commonly includes gastric cancer (at least one per one hundred infected individuals) and peptic ulcer disease (around one in ten infected individuals). c The Hp-EuReg project is an international, multicenter prospective registry collecting H. pylori treatment management strategies across Europe and including to date over 75,000 patient records. d The data in this registry includes patient metadata, treatment strategy employed by the clinician, and result of this treatment, in terms of eradication. e The H. pylori AI-clinician is trained on the Hp-EuReg dataset and designed to provide patient-specific optimal treatment recommendations for H. pylori eradication. Created in BioRender. Higgins, K. (2025) https://BioRender.com/qi85m5d.
Fig. 2
Fig. 2. H. pylori AI-clinician training and performance on real-world data.
a The mean Q scores of each treatment category in the testing phase (n = 3805 samples total) are compared, demonstrating the AI Clinician’s preference for treatments on average. Pylera (yellow) has the highest overall Q score (mean = 0.92, SD = 0.04), followed by quadruple bismuth therapies (dark purple; mean=0.90, SD = 0.05), quadruple non-bismuth therapies (medium purple; mean = 0.89, SD = 0.04), and sequential therapies (red; mean = 0.89, SD = 0.05). triple therapies have the lowest Q score on average (mean = 0.86,0.85; SD = 0.04,0.05 for clarithromycin + metronidazole, shown in orange, and clarithromycin + amoxicillin Therapies, shown in dark pink, respectively). All treatments include the prescription of a PPI. b Mean Q Scores by PPI Dose demonstrate High PPI dose (pink; mean = 0.88, SD = 0.04) has a higher average Q score than standard or low dose PPI (blue; mean = 0.86, SD = 0.04). c Mean Q scores by duration of treatment also demonstrate that 10 and 14 (khaki) day durations (light grey; mean = 0.89 and 0.88; khaki; SD = 0.04 and 0.04, respectively) out-perform 7 day (dark grey; mean = 0.85, SD = 0.04), though result in similar Q scores compared to one another.
Fig. 3
Fig. 3. Personalized recommendations.
a Recommendations per individual across 50 repeated training-testing cycles on various splits of the data were generated, with mode treatment category tabulated for each patient, with the requirement that it was recommended by more than half of the repeats. On average, 65.5% of patients were recommended a Bismuth Therapy consisting of either Pylera® or clarithromycin, amoxicillin, and bismuth salts (with PPI) (represented in teal), 15.5% of patients were recommended non-bismuth quadruple therapy with clarithromycin, amoxicillin, and metronidazole (with PPI) (light blue), and 19.0% were recommended variable treatments, with no majority recommendation (grey). b When Pylera® and quadruple therapy with bismuth salts are distinguished, 30.4% of patients are recommended Pylera® more than half of the time (pink) and 18.1% are recommended quadruple therapy with bismuth salts (dark blue), suggesting that there is not a strong preference for which of the two therapies was recommended for the majority of patients which are routinely recommended a bismuth therapy. c Random Forest models are generated for each of the therapy categories discussed above to discover the relevance of patient variables in determining recommended therapy. Balanced accuracy measures the accuracy of the RF model in predicting which treatment a patient will be recommended. Variables are ranked by mean decrease in impurity (MDI) to determine the top three variables most associated with a particular treatment recommendation.
Fig. 4
Fig. 4. H. pylori AI clinician methods overview.
a The H. pylori AI-Clinician is a recommendation system trained via reinforcement learning (RL) which learns from the result of clinical decision-making. Mathematically, the AI learns the quality of state, action pairs by observing the reward obtained by each. In practice, states are represented by patient data, and actions are represented by clinical treatment decisions. Reward is measured by the success or failure of treatment. Once trained, the AI-Clinician returns an optimal treatment for individual patients. b Hp-EuReg data, including 73,313 patient records and 30 pre-treatment patient variable categories, is preprocessed prior to model training. Only first-line treatments and patients who complied with treatment are considered resulting in 52,801 samples. Patient/Treatment pairs with less than 500 samples for a given treatment are removed to ensure sufficient training data. Actions (treatments) are encoded to include treatment category, antibiotics included, doses of antibiotics, and PPI Dose Category (including Low, Standard, High, and Other). State (patient) variables are one-hot encoded, resulting in 39,049 samples containing 77 patient variables total. c Deep Quality Network (DQN) analysis is implemented to train our recommendation system to identify optimal treatments. One-hot encoded patient data is fed into the network, which is followed by feeding into two hidden layers, and finally an output layer which represents the quality of implementing a given treatment for all possible treatments. The treatment with the highest quality (Q-Score) is the optimal treatment for a given patient. The network is trained via gradient descent at select optimization intervals (every 100 patients, optimized on previous 10 K patient states which are retained in model’s memory for re-sampling at all times). During testing, patient state information is fed into the model to receive optimal treatment recommendations, but no further optimizations are performed. Created in BioRender. Higgins, K. (2025) https://BioRender.com/z5uh3k3.

Similar articles

References

    1. Suerbaum, S. & Michetti, P. Helicobacter pylori infection. N. Engl. J. Med.347, 1175–1186 (2002). - PubMed
    1. Sipponen, P. et al. Cumulative 10-year risk of symptomatic duodenal and gastric ulcer in patients with or without chronic gastritis: a clinical follow-up study of 454 outpatients. Scand. J. Gastroenterol.25, 966–973 (1990). - PubMed
    1. Liou, J.-M. et al. Screening and eradication of Helicobacter pylori for gastric cancer prevention: the Taipei global consensus. Gut69, 2093–2112 (2020). - PubMed
    1. de Martel, C., Georges, D., Bray, F., Ferlay, J. & Clifford, G. M. Global burden of cancer attributable to infections in 2018: a worldwide incidence analysis. Lancet Glob. Health8, e180–e190 (2020). - PubMed
    1. Correa, P. & Piazuelo, M. B. The gastric precancerous cascade. J. Dig. Dis.13, 2–9 (2012). - PMC - PubMed