Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 1;80(5):1158-1168.
doi: 10.1097/HEP.0000000000000834. Epub 2024 Mar 7.

Development of a liver disease-specific large language model chat interface using retrieval-augmented generation

Affiliations

Development of a liver disease-specific large language model chat interface using retrieval-augmented generation

Jin Ge et al. Hepatology. .

Abstract

Background and aims: Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows the embedding of customized data into LLMs. This approach "specializes" the LLMs and is thought to reduce hallucinations.

Approach and results: We developed "LiVersa," a liver disease-specific LLM, by using our institution's protected health information-complaint text embedding and LLM platform, "Versa." We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases guidance documents to be incorporated into LiVersa. We evaluated LiVersa's performance by conducting 2 rounds of testing. First, we compared LiVersa's outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI's ChatGPT 4, and Meta's Large Language Model Meta AI 2. LiVersa's outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4.

Results: We evaluated LiVersa's performance by conducting 2 rounds of testing. First, we compared LiVersa's outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI's ChatGPT 4, and Meta's Large Language Model Meta AI 2. LiVersa's outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4.

Conclusions: In this demonstration, we built disease-specific and protected health information-compliant LLMs using RAG. While LiVersa demonstrated higher accuracy in answering questions related to hepatology, there were some deficiencies due to limitations set by the number of documents used for RAG. LiVersa will likely require further refinement before potential live deployment. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical use cases.

PubMed Disclaimer

Update of

References

    1. Ge J, Li M, Delk MB, Lai JC. A comparison of a large language model vs manual chart review for the extraction of data elements from the electronic health record. Gastroenterology. 2023; - PMC - PubMed
    1. Rahman M, Terano HJR, Rahman N, Salamzadeh A, Rahaman S. Chatgpt and academic research: A review and recommendations based on practical examples. J. Educ., Mngt., and Dev. Studies 2023;3:1–12.
    1. Nayak A, Alkaitis MS, Nayak K, Nikolov M, Weinfurt KP, Schulman K. Comparison of history of present illness summaries generated by a chatbot and senior internal medicine residents. JAMA Intern. Med 2023;183:1026–1027. - PMC - PubMed
    1. Han C, Kim DW, Kim S, You SC, Park JY, Bae S, et al. Evaluation Of GPT-4 for 10-Year Cardiovascular Risk Prediction: Insights from the UK Biobank and KoGES Data. 2023; - PMC - PubMed
    1. ChatGPT: Optimizing Language Models for Dialogue [Internet]. [cited 2022 Dec 17];Available from: https://openai.com/blog/chatgpt/

LinkOut - more resources