Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 28;14(1):30794.
doi: 10.1038/s41598-024-81052-3.

Evaluation of the integration of retrieval-augmented generation in large language model for breast cancer nursing care responses

Affiliations

Evaluation of the integration of retrieval-augmented generation in large language model for breast cancer nursing care responses

Ruiyu Xu et al. Sci Rep. .

Abstract

Breast cancer is one of the most common malignant tumors in women worldwide. Although large language models (LLMs) can provide breast cancer nursing care consultation, inherent hallucinations can lead to inaccurate responses. Retrieval-augmented generation (RAG) technology can improve LLM performance, offering a new approach for clinical applications. In the present study, we evaluated the performance of a LLM in breast cancer nursing care using RAG technology. In the control group (GPT-4), questions were answered directly using the GPT-4 model, whereas the experimental group (RAG-GPT) used the GPT-4 model combined with RAG. A knowledge base for breast cancer nursing was created for the RAG-GPT group, and 15 of 200 real-world clinical care questions were answered randomly. The primary endpoint was overall satisfaction, and the secondary endpoints were accuracy and empathy. RAG-GPT included a curated knowledge base related to breast cancer nursing care, including textbooks, guidelines, and traditional Chinese therapy. The RAG-GPT group showed significantly higher overall satisfaction than that of the GPT-4 group (8.4 ± 0.84 vs. 5.4 ± 1.27, p < 0.01) as well as an improved accuracy of responses (8.6 ± 0.69 vs. 5.6 ± 0.96, p < 0.01). However, there was no inter-group difference in empathy (8.4 ± 0.85 vs. 7.8 ± 1.22, p > 0.05). Overall, this study revealed that RAG technology could improve LLM performance significantly, likely because of the increased accuracy of the answers without diminishing empathy. These findings provide a theoretical basis for applying RAG technology to LLMs in clinical nursing practice and education.

Keywords: Breast cancer nursing care; ChatGPT; GPT-4; Large language models; Nurse; Retrieval-augmented generation.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Approval for human experiments: As this study did not involve human or animal research and the ChatGPT API is freely accessible online, no ethical committee approval was required.

Figures

Fig. 1
Fig. 1
Flowchart showing the RAG-GPT and GPT question-answering processes.
Fig. 2
Fig. 2
Frequency distribution of the top 15 keywords in the local breast cancer knowledge base, with “breast cancer treatment” identified as the most frequently mentioned term.
Fig. 3
Fig. 3
Evaluation results (A) The RAG-GPT group showed significantly higher satisfaction than that in the GPT-4 group. (B) Satisfaction scores for each question. (C) The RAG-GPT group demonstrated significantly higher accuracy compared with that of the GPT-4 group. (D) Accuracy scores for each question. (E) Empathy did not differ significantly between the two groups. (F) Empathy scores for the 15 questions.

Similar articles

Cited by

References

    1. Loibl, S. et al. Breast cancer. Lancet397(10286): 1750–1769. (2021). - PubMed
    1. Martin, R. F. Breast cancer. Surg. Clin. North. Am.103 (1), xiii–xiv (2023). - PubMed
    1. Harbeck, N. & Gnant, M. Breast cancer. Lancet389 (10074), 1134–1150 (2017). - PubMed
    1. Pais, C. et al. Large language models for preventing medication direction errors in online pharmacies. Nat. Med.30 (6), 1574–1582 (2024). - PMC - PubMed
    1. Farquhar, S. et al. Detecting hallucinations in large language models using semantic entropy. Nature630 (8017), 625–630 (2024). - PMC - PubMed

LinkOut - more resources