Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Aug 5;23(1):862.
doi: 10.1186/s12967-025-06871-y.

eCBT-I dialogue system: a comparative evaluation of large language models and adaptation strategies for insomnia treatment

Affiliations
Comparative Study

eCBT-I dialogue system: a comparative evaluation of large language models and adaptation strategies for insomnia treatment

Xueying Bao et al. J Transl Med. .

Abstract

Background: Traditional face-to-face mental health treatments are often limited by time and space. Thanks to the development of advanced large language models (LLMs), digital mental health treatments can provide personalized advice to patients and improve compliance. However, in the field of CBT-I, specialized, real-time interactive dialogue platforms have not been fully developed.

Methods: Our research team construct an eCBT-I intelligent dialogue system based on the RAG architecture, aiming to provide an example of the deep integration of CBT-I knowledge graphs and large language models. Furthermore, in order to optimize the performance of the system's core language generation module on the insomnia dialogue dataset, we systematically include eight mainstream large language models (ChatGLM2-6b, ChatGLM3-6b, Baichuan-7b, Baichuan-13b, Qwen-7b, Qwen2-7b, Llama-2-7b-chat-hf, and Llama-2-13b-chat-hf) and three adaptation strategies (LoRA, QLoRA, and Freeze). We screen the suitability of the three adaptation strategies for the eight major language models in the group, and thus determine the best adaptation method for each language model to maximize performance improvement. The eight best-adapted language models are then evaluated in three dimensions to compare their performance on the small sample sleep dialogue dataset and the C-eval dataset. All subjects that evaluated under experimental conditions are historical medical records and patients who did not exhibit delirium and had normal language expression abilities.

Results: Through the matching of model characteristics to adaptation strategies and the horizontal evaluation of multiple models, we compare the contribution of different fine-tuning strategies to the performance improvement of different language models on the small insomnia dialogue dataset, and finally determine that Qwen2-7b (Freeze) is the model with the best performance on the insomnia dialogue dataset.

Conclusions: This study effectively integrates the CBT-I knowledge graph with the large language model through the RAG architecture, which improves the professionalism of the eCBT-I intelligent dialogue system. The systematic fine-tuning method selection process and the confirmation of the optimal model not only improve the adaptability of the large language model in the CBT-I task, but also provide a useful paradigm for AI applications in medical subfields with resource constraints and difficulties in data collection, laying a solid foundation for more accurate and efficient digital CBT-I clinical practice in the future.

Keywords: Adaptation strategy; Large language models; Mental health; RAG architecture; eCBT-I.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflict of interest: The authors declare that the research was conducted without any commercial or financial relationships that could be construed as potential conflicts of interest.

Figures

Fig. 1
Fig. 1
Structure of the AI-assisted CBT-I dialogue system and language model. (a) BERT score for the final dataset included in the study. (b) The system processes the user’s questions and returns a response. (c) Comparisons of model parameters and reasoning times for eight common language models on the market. (d) Four evaluation indicators and weight allocation for model performance. (e) Structure of the Qwen2-7b
Fig. 2
Fig. 2
Data entry and exit group diagram
Fig. 3
Fig. 3
Selection of the best adaptation method for each language model. (a) Comparison of the three adaptation methods (LoRA, QLoRA and Freeze) for the eight language models using four evaluation indicators. (b) Performance of the three adaptation methods under each of the four indicators. (c) Selection of the normalized total score for each of the three adaptation methods for the eight language models
Fig. 4
Fig. 4
Eight best-adapted language model performance. (a) Changes in BLEU-4, ROUGE-1, ROUGE2 and ROUGE-L indicators in the training 450 epochs, training set and test set. (b) External dataset for models’ C-eval scores. (c) Comparisons of model parameters and reasoning times for eight adjusted language model

Similar articles

References

    1. Riemann D, Benz F, Dressle RJ, et al. Insomnia disorder: state of the science and challenges for the future. J Sleep Res. 2022;31(4):e13604. - PubMed
    1. Taylor DJ, Lichstein KL, Durrence HH. Insomnia as a health risk factor. Behav Sleep Med. 2003;1(4):227–47. - PubMed
    1. Dolsen EA, Asarnow LD, Harvey AG. Insomnia as a transdiagnostic process in psychiatric disorders. Curr Psychiatry Rep. 2014;16(9):471. - PMC - PubMed
    1. Wickwire EM, Shaya FT, Scharf SM. Health economics of insomnia treatments: the return on investment for a good night’s sleep. Sleep Med Rev. 2016;30:72–82. - PubMed
    1. Chan NY, Chan JWY, Li SX, et al. Non-pharmacological approaches for management of insomnia. Neurotherapeutics. 2021;18(1):32–43. - PMC - PubMed

Publication types

MeSH terms