Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 14:12:1599241.
doi: 10.3389/fmed.2025.1599241. eCollection 2025.

Enhancing treatment decision-making for low back pain: a novel framework integrating large language models with retrieval-augmented generation technology

Affiliations

Enhancing treatment decision-making for low back pain: a novel framework integrating large language models with retrieval-augmented generation technology

Rong Chen et al. Front Med (Lausanne). .

Abstract

Introduction: Chronic low back pain (CLBP) is a global health problem that seriously affects the quality of life among patients. The etiology of CLBP is complex, with non-specific symptoms and considerable heterogeneity, which poses a great challenge for diagnosis. In addition, the uncertain treatment responses as well as the potential influence of psychological and social factors further increase the difficulty of personalized decision-making in clinical practice.

Methods: This study proposed an innovative support framework on clinical decision, which combined large language models (LLMs) with retrieval-augmented generation (RAG) technology. Moreover, the least-to-most (LtM) prompting technology was introduced, aiming to simulate the decision-making process of senior experts thereby improving personalized treatment for CLBP. Additionally, a special CLBP-related dataset was generated to verify effectiveness of the framework, which compared the proposed model CLBP-GPT with GPT-4.0, ERNIE Bot, and DeepSeek in terms of five key indicators: accuracy, relevance, clarity, benefit, and completeness.

Results: The results showed that the CLBP-GPT model proposed in this study scored significantly better than other comparison models in all five evaluation dimensions. Specifically, the total score of CLBP-GPT was 4.40 (SD = 0.20), substantially higher than GPT-4.0 (4.03, SD = 0.48), ERNIE Bot (3.54, SD = 0.53), and DeepSeek (3.81, SD = 0.47). In terms of accuracy, the average score of CLBP-GPT was 4.38 (SD = 0.19), while the scores of other models were all below 4, indicating that CLBP-GPT could provide more accurate clinical decision-making recommendations. In addition, CLBP-GPT scored as high as 4.42 (SD = 0.19) in the completeness dimension, further demonstrating that the decision content output by the model was more comprehensive and covered more key information related to CLBP.

Discussion: This study not only provides new technical support for clinical decision-making in CLBP, but also introduces a powerful tool for doctors to formulate personalized and efficient treatment strategies. It is expected to improve the diagnosis and treatment of CLBP in the future.

Keywords: GPT-4.0; chronic low back pain; clinical decision-making; large language models; treatment.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The framework of CLBP-GPT for chronic low back pain that integrates LLMs and RAG techniques: (1) collect patient questions from the Internet as well as hospitals, and organize them into data sets. (2) Build a knowledge base for CLBP based on the latest literature and research papers. (3) Use GPT to extract key features from patient complaints. (4) Retrieve the latest knowledge in the knowledge base based on keywords. (5) Prompt engineering design. (6) Generate results using GPT-4.0. (7) Provide personalized decisions to patients.
FIGURE 2
FIGURE 2
Box plots of the experimental results of the five indicators in each model. From left to right, they are CLBP-GPT, GPT4.0, ERNIE Bot, and DeepSeek.

References

    1. Wu A, March L, Zheng X, Huang J, Wang X, Zhao J, et al. Global low back pain prevalence and years lived with disability from 1990 to 2017: Estimates from the Global burden of disease study 2017. Ann Transl Med. (2017) 8:299. 10.21037/atm.2020.02.175 - DOI - PMC - PubMed
    1. Gerhart J, Burns J, Bruehl S, Smith D, Post K, Porter L, et al. Variability in negative emotions among individuals with chronic low back pain: Relationships with pain and function. Pain. (2018) 159:342–50. 10.1097/j.pain.0000000000001102 - DOI - PMC - PubMed
    1. Kabeer A, Osmani H, Patel J, Robinson P, Ahmed N. The adult with low back pain: Causes, diagnosis, imaging features and management. Br J Hosp Med (Lond). (2023) 84:1–9. 10.12968/hmed.2023.0063 - DOI - PubMed
    1. Wilson L, Denham A, Ionova Y, O’Neill C, Greco C, Hassett A, et al. Preferences for risks and benefits of treatment outcomes for chronic low back pain: Choice-based conjoint measure development and discrete choice experiment. PM R (2024) 16:836–47. 10.1002/pmrj.13112 - DOI - PMC - PubMed
    1. Rajasekaran S, Dilip Chand Raja S, Pushpa BT, Ananda KB, Ajoy Prasad S, Rishi MK. The catastrophization effects of an MRI report on the patient and surgeon and the benefits of ‘clinical reporting’: Results from an RCT and blinded trials. Eur Spine J. (2021) 30:2069–81. 10.1007/s00586-021-06809-0 - DOI - PubMed

LinkOut - more resources