Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 19:27:e67677.
doi: 10.2196/67677.

Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models

Affiliations

Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models

Yu Hou et al. J Med Internet Res. .

Abstract

Background: Dietary supplements (DSs) are widely used to improve health and nutrition, but challenges related to misinformation, safety, and efficacy persist due to less stringent regulations compared with pharmaceuticals. Accurate and reliable DS information is critical for both consumers and health care providers to make informed decisions.

Objective: This study aimed to enhance DS-related question answering by integrating an advanced retrieval-augmented generation (RAG) system with the integrated Dietary Supplement Knowledgebase 2.0 (iDISK2.0), a dietary supplement knowledge base, to improve accuracy and reliability.

Methods: We developed iDISK2.0 by integrating updated data from authoritative sources, including the Natural Medicines Comprehensive Database, the Memorial Sloan Kettering Cancer Center database, Dietary Supplement Label Database, and Licensed Natural Health Products Database, and applied advanced data cleaning and standardization techniques to reduce noise. The RAG system combined the retrieval power of a biomedical knowledge graph with the generative capabilities of large language models (LLMs) to address limitations of stand-alone LLMs, such as hallucination. The system retrieves contextually relevant subgraphs from iDISK2.0 based on user queries, enabling accurate and evidence-based responses through a user-friendly interface. We evaluated the system using true-or-false and multiple-choice questions derived from the Memorial Sloan Kettering Cancer Center database and compared its performance with stand-alone LLMs.

Results: iDISK2.0 integrates 174,317 entities across 7 categories, including 8091 dietary supplement ingredients; 163,806 dietary supplement products; 786 diseases; and 625 drugs, along with 6 types of relationships. The RAG system achieved an accuracy of 99% (990/1000) for true-or-false questions on DS effectiveness and 95% (948/100) for multiple-choice questions on DS-drug interactions, substantially outperforming stand-alone LLMs like GPT-4o (OpenAI), which scored 62% (618/1000) and 52% (517/1000) on these respective tasks. The user interface enabled efficient interaction, supporting free-form text input and providing accurate responses. Integration strategies minimized data noise, ensuring access to up-to-date, DS-related information.

Conclusions: By integrating a robust knowledge graph with RAG and LLM technologies, iDISK2.0 addresses the critical limitations of stand-alone LLMs in DS information retrieval. This study highlights the importance of combining structured data with advanced artificial intelligence methods to improve accuracy and reduce misinformation in health care applications. Future work includes extending the framework to broader biomedical domains and improving evaluation with real-world, open-ended queries.

Keywords: dietary supplements; knowledge graph; knowledge representation; large language model; retrieval-augmented generation; user interface.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: HL is an Associate Editor for JAI. None declared by the other authors.

Figures

Figure 1
Figure 1
An illustration of the study pipeline. DSLD: Dietary Supplement Label Database; NHP: Natural Health Products; iDISK2.0: integrated Dietary Supplement Knowledgebase; LLM: large language model; iDISK2.0-RAG: integrated Dietary Supplement Knowledgebase—retrieval-augmented generation.
Figure 2
Figure 2
The overall design of the iDISK2.0-RAG (integrated Dietary Supplement Knowledgebase—retrieval-augmented generation). LLM: large language model.
Figure 3
Figure 3
Model performance in question-and-answer tasks. T/F: true-or-false; MCQ: multiple-choice question; iDISK2.0-RAG: integrated Dietary Supplement Knowledgebase—retrieval-augmented generation.
Figure 4
Figure 4
An example of question-and-answer on the user interface.

Similar articles

References

    1. Dietary Supplement Health and Education Act of 1994. National Institutes of Health. 1994. [2025-03-05]. https://ods.od.nih.gov/About/DSHEA_Wording.aspx .
    1. Dietary supplements. US Food and Drug Administration. 2024. [2025-03-05]. https://www.fda.gov/food/dietary-supplements .
    1. Nine in ten dietary or nutritional supplement users agree that dietary supplements are essential to maintaining their health. Ipsos. 2023. [2025-03-05]. https://www.ipsos.com/en-us/nine-ten-dietary-or-nutritional-supplement-u... .
    1. CRN responds to recent JAMA commentary on multivitamin efficacy. CRN. 2024. [2025-03-05]. https://www.crnusa.org/newsroom/crn-responds-recent-jama-commentary-mult... .
    1. Dwyer JT, Coates PM. Why Americans need information on dietary supplements. J Nutr. 2018;148(suppl_2):1401S–1405S. doi: 10.1093/jn/nxy081. https://linkinghub.elsevier.com/retrieve/pii/S0022-3166(22)16411-4 S0022-3166(22)16411-4 - DOI - PMC - PubMed

LinkOut - more resources