Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
- PMID: 38609507
- PMCID: PMC10987499
- DOI: 10.1038/s44184-024-00056-z
Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
Abstract
Large language models (LLMs) such as Open AI's GPT-4 (which power ChatGPT) and Google's Gemini, built on artificial intelligence, hold immense potential to support, augment, or even eventually automate psychotherapy. Enthusiasm about such applications is mounting in the field as well as industry. These developments promise to address insufficient mental healthcare system capacity and scale individual access to personalized treatments. However, clinical psychology is an uncommonly high stakes application domain for AI systems, as responsible and evidence-based therapy requires nuanced expertise. This paper provides a roadmap for the ambitious yet responsible application of clinical LLMs in psychotherapy. First, a technical overview of clinical LLMs is presented. Second, the stages of integration of LLMs into psychotherapy are discussed while highlighting parallels to the development of autonomous vehicle technology. Third, potential applications of LLMs in clinical care, training, and research are discussed, highlighting areas of risk given the complex nature of psychotherapy. Fourth, recommendations for the responsible development and evaluation of clinical LLMs are provided, which include centering clinical science, involving robust interdisciplinary collaboration, and attending to issues like assessment, risk detection, transparency, and bias. Lastly, a vision is outlined for how LLMs might enable a new generation of studies of evidence-based interventions at scale, and how these studies may challenge assumptions about psychotherapy.
© 2024. The Author(s).
Conflict of interest statement
The authors declare the following competing interests: receiving consultation fees from Jimini Health (E.C.S., L.H.U., H.A.S., and J.C.E.).
Figures



Similar articles
-
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.BMJ. 2024 Mar 20;384:e078538. doi: 10.1136/bmj-2023-078538. BMJ. 2024. PMID: 38508682 Free PMC article.
-
Assessing prognosis in depression: comparing perspectives of AI models, mental health professionals and the general public.Fam Med Community Health. 2024 Jan 9;12(Suppl 1):e002583. doi: 10.1136/fmch-2023-002583. Fam Med Community Health. 2024. PMID: 38199604 Free PMC article.
-
The role of large language models in medical image processing: a narrative review.Quant Imaging Med Surg. 2024 Jan 3;14(1):1108-1121. doi: 10.21037/qims-23-892. Epub 2023 Nov 23. Quant Imaging Med Surg. 2024. PMID: 38223123 Free PMC article. Review.
-
Large language models: a primer and gastroenterology applications.Therap Adv Gastroenterol. 2024 Feb 22;17:17562848241227031. doi: 10.1177/17562848241227031. eCollection 2024. Therap Adv Gastroenterol. 2024. PMID: 38390029 Free PMC article. Review.
-
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.Eur J Orthod. 2024 Apr 13:cjae017. doi: 10.1093/ejo/cjae017. Online ahead of print. Eur J Orthod. 2024. PMID: 38613510
Cited by
-
Large Language Models for Mental Health Applications: Systematic Review.JMIR Ment Health. 2024 Oct 18;11:e57400. doi: 10.2196/57400. JMIR Ment Health. 2024. PMID: 39423368 Free PMC article.
-
Evaluating Diagnostic Accuracy and Treatment Efficacy in Mental Health: A Comparative Analysis of Large Language Model Tools and Mental Health Professionals.Eur J Investig Health Psychol Educ. 2025 Jan 18;15(1):9. doi: 10.3390/ejihpe15010009. Eur J Investig Health Psychol Educ. 2025. PMID: 39852192 Free PMC article.
-
Engineering of Generative Artificial Intelligence and Natural Language Processing Models to Accurately Identify Arrhythmia Recurrence.Circ Arrhythm Electrophysiol. 2025 Jan;18(1):e013023. doi: 10.1161/CIRCEP.124.013023. Epub 2024 Dec 16. Circ Arrhythm Electrophysiol. 2025. PMID: 39676642
-
Navigating promise and perils: applying artificial intelligence to the perinatal mental health care cascade.Npj Health Syst. 2025;2(1):26. doi: 10.1038/s44401-025-00030-7. Epub 2025 Jul 23. Npj Health Syst. 2025. PMID: 40718765 Free PMC article. Review.
-
Moral judgments in online discourse are not biased by gender.Sci Rep. 2025 Jul 1;15(1):21555. doi: 10.1038/s41598-025-08749-x. Sci Rep. 2025. PMID: 40594718 Free PMC article.
References
-
- Bubeck, S. et al. Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint at http://arxiv.org/abs/2303.12712 (2023).
-
- Broderick, R. People are using AI for therapy, whether the tech is ready for it or not. Fast Company (2023).
-
- Weizenbaum J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM. 1966;9:36–45. doi: 10.1145/365153.365168. - DOI
Grants and funding
LinkOut - more resources
Full Text Sources
Medical