Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models
- PMID: 40106799
- PMCID: PMC11966073
- DOI: 10.2196/67677
Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models
Abstract
Background: Dietary supplements (DSs) are widely used to improve health and nutrition, but challenges related to misinformation, safety, and efficacy persist due to less stringent regulations compared with pharmaceuticals. Accurate and reliable DS information is critical for both consumers and health care providers to make informed decisions.
Objective: This study aimed to enhance DS-related question answering by integrating an advanced retrieval-augmented generation (RAG) system with the integrated Dietary Supplement Knowledgebase 2.0 (iDISK2.0), a dietary supplement knowledge base, to improve accuracy and reliability.
Methods: We developed iDISK2.0 by integrating updated data from authoritative sources, including the Natural Medicines Comprehensive Database, the Memorial Sloan Kettering Cancer Center database, Dietary Supplement Label Database, and Licensed Natural Health Products Database, and applied advanced data cleaning and standardization techniques to reduce noise. The RAG system combined the retrieval power of a biomedical knowledge graph with the generative capabilities of large language models (LLMs) to address limitations of stand-alone LLMs, such as hallucination. The system retrieves contextually relevant subgraphs from iDISK2.0 based on user queries, enabling accurate and evidence-based responses through a user-friendly interface. We evaluated the system using true-or-false and multiple-choice questions derived from the Memorial Sloan Kettering Cancer Center database and compared its performance with stand-alone LLMs.
Results: iDISK2.0 integrates 174,317 entities across 7 categories, including 8091 dietary supplement ingredients; 163,806 dietary supplement products; 786 diseases; and 625 drugs, along with 6 types of relationships. The RAG system achieved an accuracy of 99% (990/1000) for true-or-false questions on DS effectiveness and 95% (948/100) for multiple-choice questions on DS-drug interactions, substantially outperforming stand-alone LLMs like GPT-4o (OpenAI), which scored 62% (618/1000) and 52% (517/1000) on these respective tasks. The user interface enabled efficient interaction, supporting free-form text input and providing accurate responses. Integration strategies minimized data noise, ensuring access to up-to-date, DS-related information.
Conclusions: By integrating a robust knowledge graph with RAG and LLM technologies, iDISK2.0 addresses the critical limitations of stand-alone LLMs in DS information retrieval. This study highlights the importance of combining structured data with advanced artificial intelligence methods to improve accuracy and reduce misinformation in health care applications. Future work includes extending the framework to broader biomedical domains and improving evaluation with real-world, open-ended queries.
Keywords: dietary supplements; knowledge graph; knowledge representation; large language model; retrieval-augmented generation; user interface.
©Yu Hou, Jeffrey R Bishop, Hongfang Liu, Rui Zhang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 19.03.2025.
Conflict of interest statement
Conflicts of Interest: HL is an Associate Editor for JAI. None declared by the other authors.
Figures




Similar articles
-
Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study.J Med Internet Res. 2025 Apr 30;27:e66098. doi: 10.2196/66098. J Med Internet Res. 2025. PMID: 40306628 Free PMC article.
-
Detecting emergencies in patient portal messages using large language models and knowledge graph-based retrieval-augmented generation.J Am Med Inform Assoc. 2025 Jun 1;32(6):1032-1039. doi: 10.1093/jamia/ocaf059. J Am Med Inform Assoc. 2025. PMID: 40220286 Free PMC article.
-
Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study.Eur J Nucl Med Mol Imaging. 2025 Jun;52(7):2452-2462. doi: 10.1007/s00259-025-07101-9. Epub 2025 Jan 23. Eur J Nucl Med Mol Imaging. 2025. PMID: 39843863 Free PMC article.
-
Utilizing large language models for gastroenterology research: a conceptual framework.Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025. Therap Adv Gastroenterol. 2025. PMID: 40171241 Free PMC article. Review.
-
RAGing ahead in rheumatology: new language model architectures to tame artificial intelligence.Ther Adv Musculoskelet Dis. 2025 Apr 21;17:1759720X251331529. doi: 10.1177/1759720X251331529. eCollection 2025. Ther Adv Musculoskelet Dis. 2025. PMID: 40292012 Free PMC article. Review.
References
-
- Dietary Supplement Health and Education Act of 1994. National Institutes of Health. 1994. [2025-03-05]. https://ods.od.nih.gov/About/DSHEA_Wording.aspx .
-
- Dietary supplements. US Food and Drug Administration. 2024. [2025-03-05]. https://www.fda.gov/food/dietary-supplements .
-
- Nine in ten dietary or nutritional supplement users agree that dietary supplements are essential to maintaining their health. Ipsos. 2023. [2025-03-05]. https://www.ipsos.com/en-us/nine-ten-dietary-or-nutritional-supplement-u... .
-
- CRN responds to recent JAMA commentary on multivitamin efficacy. CRN. 2024. [2025-03-05]. https://www.crnusa.org/newsroom/crn-responds-recent-jama-commentary-mult... .
-
- Dwyer JT, Coates PM. Why Americans need information on dietary supplements. J Nutr. 2018;148(suppl_2):1401S–1405S. doi: 10.1093/jn/nxy081. https://linkinghub.elsevier.com/retrieve/pii/S0022-3166(22)16411-4 S0022-3166(22)16411-4 - DOI - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical