Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Apr 1:18:17562848251328577.
doi: 10.1177/17562848251328577. eCollection 2025.

Utilizing large language models for gastroenterology research: a conceptual framework

Affiliations
Review

Utilizing large language models for gastroenterology research: a conceptual framework

Parul Berry et al. Therap Adv Gastroenterol. .

Abstract

Large language models (LLMs) transform healthcare by assisting clinicians with decision-making, research, and patient management. In gastroenterology, LLMs have shown potential in clinical decision support, data extraction, and patient education. However, challenges such as bias, hallucinations, integration with clinical workflows, and regulatory compliance must be addressed for safe and effective implementation. This manuscript presents a structured framework for integrating LLMs into gastroenterology, using Hepatitis C treatment as a real-world application. The framework outlines key steps to ensure accuracy, safety, and clinical relevance while mitigating risks associated with artificial intelligence (AI)-driven healthcare tools. The framework includes defining clinical goals, assembling a multidisciplinary team, data collection and preparation, model selection, fine-tuning, calibration, hallucination mitigation, user interface development, integration with electronic health records, real-world validation, and continuous improvement. Retrieval-augmented generation and fine-tuning approaches are evaluated for optimizing model adaptability. Bias detection, reinforcement learning from human feedback, and structured prompt engineering are incorporated to enhance reliability. Ethical and regulatory considerations, including the Health Insurance Portability and Accountability Act, General Data Protection Regulation, and AI-specific guidelines (DECIDE-AI, SPIRIT-AI, CONSORT-AI), are addressed to ensure responsible AI deployment. LLMs have the potential to enhance decision-making, research efficiency, and patient care in gastroenterology, but responsible deployment requires bias mitigation, transparency, and ongoing validation. Future research should focus on multi-institutional validation and AI-assisted clinical trials to establish LLMs as reliable tools in gastroenterology.

Keywords: artificial intelligence; framework; generative artificial intelligence; healthcare.

Plain language summary

How large language models could transform gastroenterology: a framework for future research and care Artificial intelligence (AI) is transforming healthcare by helping doctors make better decisions, analyze research faster, and improve patient care. Large language models (LLMs) are a type of AI that process and generate human-like text, making them useful in gastroenterology. This paper presents a structured framework for safely using LLMs in clinical practice, using Hepatitis C treatment as an example. The framework begins by setting clear goals, such as improving Hepatitis C treatment recommendations or making patient education easier to understand. A team of doctors, AI specialists, and data experts is assembled to ensure the model is medically accurate and practical. Next, relevant medical data from electronic health records (EHRs), clinical guidelines, and research studies is gathered and prepared to improve AI, ensuring it provides useful and fair recommendations. The right AI model is then chosen and improved to specialize in gastroenterology. To make sure the model is reliable and makes correct suggestions, its performance is checked and adjusted before use. A user-friendly interface is created so doctors can access AI-generated recommendations directly in EHRs and decision-support tools, making it easy to integrate into daily practice. Before full use, the AI is tested in real-world settings, where gastroenterologists review its recommendations for safety and accuracy. Once in use, ongoing updates based on doctor feedback help improve its performance. Ethical and legal safeguards, such as protecting patient privacy and ensuring fairness, guide its responsible use. Findings are then shared with the medical community, allowing for further testing and broader adoption. By following this framework, LLMs can help doctors make better decisions, personalize treatments, and improve efficiency, ultimately leading to better patient outcomes in gastroenterology.

PubMed Disclaimer

Conflict of interest statement

P.B.: None. R.R.D: None. S.K.: Research grants from Rebiotix, Inc. (a Ferring company), Seres, Finch, Vedanta, and Pfizer; consulting fees from Takeda, Rise, and ProbioTech Inc., outside of the submitted work. The Associate Editor of Therapeutic Advances in Gastroenterology is the author of this paper; therefore, the peer review process was managed by alternative members of the Board and the submitting Editor was not involved in the decision-making process.

Figures

Figure 1.
Figure 1.
This figure illustrates a step-by-step framework for developing an LLM-powered clinical decision support system for Hepatitis C treatment. It outlines key stages, including ethical approvals, data collection, model selection, calibration, user interface development, integration with EHRs, and continuous improvement. EHR, electronic health records; LLM, large language models.
Figure 2.
Figure 2.
This figure presents a wireframe of the LLM-driven clinical decision support interface for Hepatitis C management. It demonstrates structured data entry fields for patient ID, genotype, fibrosis score, co-morbidities, and HCV RNA levels. The interface pre-populates critical fields, integrates biopsy/imaging uploads, and generates personalized treatment recommendations based on clinical guidelines. A feedback loop allows clinicians to refine model outputs, enhancing reliability and usability. LLM, large language models.

Similar articles

References

    1. Sheikh H, Prins C, Schrijvers E. Artificial intelligence: definition and background. In: Sheikh H, Prins C, Schrijvers E. (eds) Mission AI: the new system technology. Cham: Springer International Publishing, 2023, pp. 15–41.
    1. Mitchell TM. Machine learning. New York: McGraw-Hill, 1997.
    1. Shai Shalev-Shwartz S, Ben-David S. Understanding machine learning. Cambridge: Cambridge University Press, 2014.
    1. Sarker IH. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2021; 2: 420. - PMC - PubMed
    1. Kim S, Thiessen PA, Bolton EE, et al.. PubChem substance and compound databases. Nucleic Acids Res 2016; 44: D1202–D1213. - PMC - PubMed

LinkOut - more resources