Utilizing large language models for gastroenterology research: a conceptual framework

doi:10.1177/17562848251328577

Review

. 2025 Apr 1:18:17562848251328577.

doi: 10.1177/17562848251328577. eCollection 2025.

Utilizing large language models for gastroenterology research: a conceptual framework

Parul Berry¹, Rohan Raju Dhanakshirur², Sahil Khanna³

Affiliations

¹ Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA.
² Indian Institute of Technology, New Delhi, India.
³ Division of Gastroenterology and Hepatology, Mayo Clinic, 200 1st Street SW, Rochester, MN 55905, USA.

PMID: 40171241
PMCID: PMC11960180
DOI: 10.1177/17562848251328577

Review

Utilizing large language models for gastroenterology research: a conceptual framework

Parul Berry et al. Therap Adv Gastroenterol. 2025.

. 2025 Apr 1:18:17562848251328577.

doi: 10.1177/17562848251328577. eCollection 2025.

Authors

Parul Berry¹, Rohan Raju Dhanakshirur², Sahil Khanna³

Affiliations

¹ Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA.
² Indian Institute of Technology, New Delhi, India.
³ Division of Gastroenterology and Hepatology, Mayo Clinic, 200 1st Street SW, Rochester, MN 55905, USA.

PMID: 40171241
PMCID: PMC11960180
DOI: 10.1177/17562848251328577

Abstract

Large language models (LLMs) transform healthcare by assisting clinicians with decision-making, research, and patient management. In gastroenterology, LLMs have shown potential in clinical decision support, data extraction, and patient education. However, challenges such as bias, hallucinations, integration with clinical workflows, and regulatory compliance must be addressed for safe and effective implementation. This manuscript presents a structured framework for integrating LLMs into gastroenterology, using Hepatitis C treatment as a real-world application. The framework outlines key steps to ensure accuracy, safety, and clinical relevance while mitigating risks associated with artificial intelligence (AI)-driven healthcare tools. The framework includes defining clinical goals, assembling a multidisciplinary team, data collection and preparation, model selection, fine-tuning, calibration, hallucination mitigation, user interface development, integration with electronic health records, real-world validation, and continuous improvement. Retrieval-augmented generation and fine-tuning approaches are evaluated for optimizing model adaptability. Bias detection, reinforcement learning from human feedback, and structured prompt engineering are incorporated to enhance reliability. Ethical and regulatory considerations, including the Health Insurance Portability and Accountability Act, General Data Protection Regulation, and AI-specific guidelines (DECIDE-AI, SPIRIT-AI, CONSORT-AI), are addressed to ensure responsible AI deployment. LLMs have the potential to enhance decision-making, research efficiency, and patient care in gastroenterology, but responsible deployment requires bias mitigation, transparency, and ongoing validation. Future research should focus on multi-institutional validation and AI-assisted clinical trials to establish LLMs as reliable tools in gastroenterology.

Keywords: artificial intelligence; framework; generative artificial intelligence; healthcare.

Plain language summary

How large language models could transform gastroenterology: a framework for future research and care Artificial intelligence (AI) is transforming healthcare by helping doctors make better decisions, analyze research faster, and improve patient care. Large language models (LLMs) are a type of AI that process and generate human-like text, making them useful in gastroenterology. This paper presents a structured framework for safely using LLMs in clinical practice, using Hepatitis C treatment as an example. The framework begins by setting clear goals, such as improving Hepatitis C treatment recommendations or making patient education easier to understand. A team of doctors, AI specialists, and data experts is assembled to ensure the model is medically accurate and practical. Next, relevant medical data from electronic health records (EHRs), clinical guidelines, and research studies is gathered and prepared to improve AI, ensuring it provides useful and fair recommendations. The right AI model is then chosen and improved to specialize in gastroenterology. To make sure the model is reliable and makes correct suggestions, its performance is checked and adjusted before use. A user-friendly interface is created so doctors can access AI-generated recommendations directly in EHRs and decision-support tools, making it easy to integrate into daily practice. Before full use, the AI is tested in real-world settings, where gastroenterologists review its recommendations for safety and accuracy. Once in use, ongoing updates based on doctor feedback help improve its performance. Ethical and legal safeguards, such as protecting patient privacy and ensuring fairness, guide its responsible use. Findings are then shared with the medical community, allowing for further testing and broader adoption. By following this framework, LLMs can help doctors make better decisions, personalize treatments, and improve efficiency, ultimately leading to better patient outcomes in gastroenterology.

PubMed Disclaimer

Conflict of interest statement

P.B.: None. R.R.D: None. S.K.: Research grants from Rebiotix, Inc. (a Ferring company), Seres, Finch, Vedanta, and Pfizer; consulting fees from Takeda, Rise, and ProbioTech Inc., outside of the submitted work. The Associate Editor of Therapeutic Advances in Gastroenterology is the author of this paper; therefore, the peer review process was managed by alternative members of the Board and the submitting Editor was not involved in the decision-making process.

Figures

**Figure 1.**
This figure illustrates a step-by-step framework for developing an LLM-powered clinical decision support system for Hepatitis C treatment. It outlines key stages, including ethical approvals, data collection, model selection, calibration, user interface development, integration with EHRs, and continuous improvement. EHR, electronic health records; LLM, large language models.

**Figure 2.**
This figure presents a wireframe of the LLM-driven clinical decision support interface for Hepatitis C management. It demonstrates structured data entry fields for patient ID, genotype, fibrosis score, co-morbidities, and HCV RNA levels. The interface pre-populates critical fields, integrates biopsy/imaging uploads, and generates personalized treatment recommendations based on clinical guidelines. A feedback loop allows clinicians to refine model outputs, enhancing reliability and usability. LLM, large language models.

See this image and copyright information in PMC

References

1. Sheikh H, Prins C, Schrijvers E. Artificial intelligence: definition and background. In: Sheikh H, Prins C, Schrijvers E. (eds) Mission AI: the new system technology. Cham: Springer International Publishing, 2023, pp. 15–41.
1. Mitchell TM. Machine learning. New York: McGraw-Hill, 1997.
1. Shai Shalev-Shwartz S, Ben-David S. Understanding machine learning. Cambridge: Cambridge University Press, 2014.
1. Sarker IH. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2021; 2: 420. - PMC - PubMed
1. Kim S, Thiessen PA, Bolton EE, et al.. PubChem substance and compound databases. Nucleic Acids Res 2016; 44: D1202–D1213. - PMC - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central

[1] Sheikh H, Prins C, Schrijvers E. Artificial intelligence: definition and background. In: Sheikh H, Prins C, Schrijvers E. (eds) Mission AI: the new system technology. Cham: Springer International Publishing, 2023, pp. 15–41.

[2] Sheikh H, Prins C, Schrijvers E. Artificial intelligence: definition and background. In: Sheikh H, Prins C, Schrijvers E. (eds) Mission AI: the new system technology. Cham: Springer International Publishing, 2023, pp. 15–41.

[3] Mitchell TM. Machine learning. New York: McGraw-Hill, 1997.

[4] Mitchell TM. Machine learning. New York: McGraw-Hill, 1997.

[5] Shai Shalev-Shwartz S, Ben-David S. Understanding machine learning. Cambridge: Cambridge University Press, 2014.

[6] Shai Shalev-Shwartz S, Ben-David S. Understanding machine learning. Cambridge: Cambridge University Press, 2014.

[7] Sarker IH. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2021; 2: 420. - PMC - PubMed

[8] Sarker IH. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2021; 2: 420. - PMC - PubMed

[9] Kim S, Thiessen PA, Bolton EE, et al.. PubChem substance and compound databases. Nucleic Acids Res 2016; 44: D1202–D1213. - PMC - PubMed

[10] Kim S, Thiessen PA, Bolton EE, et al.. PubChem substance and compound databases. Nucleic Acids Res 2016; 44: D1202–D1213. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Utilizing large language models for gastroenterology research: a conceptual framework

Affiliations

Utilizing large language models for gastroenterology research: a conceptual framework

Authors

Affiliations

Abstract

Plain language summary

Conflict of interest statement

Figures

Similar articles

References

Publication types

LinkOut - more resources

Full Text Sources