Evaluating Generative AI in Mental Health: Systematic Review of Capabilities and Limitations
- PMID: 40373033
- PMCID: PMC12097452
- DOI: 10.2196/70014
Evaluating Generative AI in Mental Health: Systematic Review of Capabilities and Limitations
Abstract
Background: The global shortage of mental health professionals, exacerbated by increasing mental health needs post COVID-19, has stimulated growing interest in leveraging large language models to address these challenges.
objectives: This systematic review aims to evaluate the current capabilities of generative artificial intelligence (GenAI) models in the context of mental health applications.
Methods: A comprehensive search across 5 databases yielded 1046 references, of which 8 studies met the inclusion criteria. The included studies were original research with experimental designs (eg, Turing tests, sociocognitive tasks, trials, or qualitative methods); a focus on GenAI models; and explicit measurement of sociocognitive abilities (eg, empathy and emotional awareness), mental health outcomes, and user experience (eg, perceived trust and empathy).
Results: The studies, published between 2023 and 2024, primarily evaluated models such as ChatGPT-3.5 and 4.0, Bard, and Claude in tasks such as psychoeducation, diagnosis, emotional awareness, and clinical interventions. Most studies used zero-shot prompting and human evaluators to assess the AI responses, using standardized rating scales or qualitative analysis. However, these methods were often insufficient to fully capture the complexity of GenAI capabilities. The reliance on single-shot prompting techniques, limited comparisons, and task-based assessments isolated from a context may oversimplify GenAI's abilities and overlook the nuances of human-artificial intelligence interaction, especially in clinical applications that require contextual reasoning and cultural sensitivity. The findings suggest that while GenAI models demonstrate strengths in psychoeducation and emotional awareness, their diagnostic accuracy, cultural competence, and ability to engage users emotionally remain limited. Users frequently reported concerns about trustworthiness, accuracy, and the lack of emotional engagement.
Conclusions: Future research could use more sophisticated evaluation methods, such as few-shot and chain-of-thought prompting to fully uncover GenAI's potential. Longitudinal studies and broader comparisons with human benchmarks are needed to explore the effects of GenAI-integrated mental health care.
Keywords: LLM; clinical skills; digital mental health intervention; evaluation; generative artificial intelligence; large language model; mental health.
© Liying Wang, Tanmay Bhanushali, Zhuoran Huang, Jingyi Yang, Sukriti Badami, Lisa Hightow-Weidman. Originally published in JMIR Mental Health (https://mental.jmir.org).
Conflict of interest statement
Figures
Similar articles
-
Evaluating User Interactions and Adoption Patterns of Generative AI in Health Care Occupations Using Claude: Cross-Sectional Study.J Med Internet Res. 2025 May 30;27:e73918. doi: 10.2196/73918. J Med Internet Res. 2025. PMID: 40446149 Free PMC article.
-
Prompts, privacy, and personalized learning: integrating AI into nursing education-a qualitative study.BMC Nurs. 2025 Apr 29;24(1):470. doi: 10.1186/s12912-025-03115-8. BMC Nurs. 2025. PMID: 40301862 Free PMC article.
-
Decoding medical educators' perceptions on generative artificial intelligence in medical education.J Investig Med. 2024 Oct;72(7):633-639. doi: 10.1177/10815589241257215. Epub 2024 Jun 7. J Investig Med. 2024. PMID: 38785310
-
Generative AI in Higher Education: Balancing Innovation and Integrity.Br J Biomed Sci. 2025 Jan 9;81:14048. doi: 10.3389/bjbs.2024.14048. eCollection 2024. Br J Biomed Sci. 2025. PMID: 39850144 Free PMC article. Review.
-
AI as a Helper: Leveraging Generative AI Tools Across Common Parts of the Creative Process.J Intell. 2025 May 20;13(5):57. doi: 10.3390/jintelligence13050057. J Intell. 2025. PMID: 40422657 Free PMC article. Review.
References
-
- What is artificial intelligence (AI)? Google Cloud. [09-09-2024]. https://cloud.google.com/learn/what-is-artificial-intelligence URL. Accessed.
-
- Ji S, Pan S, Li X, Cambria E, Long G, Huang Z. Suicidal ideation detection: a review of machine learning methods and applications. IEEE Trans Comput Soc Syst. 2021;8(1):214–226. doi: 10.1109/TCSS.2020.3021467. doi. - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous