Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study
- PMID: 39292502
- PMCID: PMC11447422
- DOI: 10.2196/54617
Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study
Erratum in
-
Correction: Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.J Med Internet Res. 2025 Jul 8;27:e79198. doi: 10.2196/79198. J Med Internet Res. 2025. PMID: 40627851 Free PMC article.
Abstract
Background: Depressive disorders have substantial global implications, leading to various social consequences, including decreased occupational productivity and a high disability burden. Early detection and intervention for clinically significant depression have gained attention; however, the existing depression screening tools, such as the Center for Epidemiologic Studies Depression Scale, have limitations in objectivity and accuracy. Therefore, researchers are identifying objective indicators of depression, including image analysis, blood biomarkers, and ecological momentary assessments (EMAs). Among EMAs, user-generated text data, particularly from diary writing, have emerged as a clinically significant and analyzable source for detecting or diagnosing depression, leveraging advancements in large language models such as ChatGPT.
Objective: We aimed to detect depression based on user-generated diary text through an emotional diary writing app using a large language model (LLM). We aimed to validate the value of the semistructured diary text data as an EMA data source.
Methods: Participants were assessed for depression using the Patient Health Questionnaire and suicide risk was evaluated using the Beck Scale for Suicide Ideation before starting and after completing the 2-week diary writing period. The text data from the daily diaries were also used in the analysis. The performance of leading LLMs, such as ChatGPT with GPT-3.5 and GPT-4, was assessed with and without GPT-3.5 fine-tuning on the training data set. The model performance comparison involved the use of chain-of-thought and zero-shot prompting to analyze the text structure and content.
Results: We used 428 diaries from 91 participants; GPT-3.5 fine-tuning demonstrated superior performance in depression detection, achieving an accuracy of 0.902 and a specificity of 0.955. However, the balanced accuracy was the highest (0.844) for GPT-3.5 without fine-tuning and prompt techniques; it displayed a recall of 0.929.
Conclusions: Both GPT-3.5 and GPT-4.0 demonstrated relatively reasonable performance in recognizing the risk of depression based on diaries. Our findings highlight the potential clinical usefulness of user-generated text data for detecting depression. In addition to measurable indicators, such as step count and physical activity, future research should increasingly emphasize qualitative digital expression.
Keywords: artificial intelligence; depression; digital health technology; screening; text data.
©Daun Shin, Hyoseung Kim, Seunghwan Lee, Younhee Cho, Whanbo Jung. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.09.2024.
Conflict of interest statement
Conflicts of Interest: DS is the chief medical officer at Doctorpresso. While DS does not receive a salary from the company, she owns 25% of its equity. YC is a project manager at Doctorpresso and receives a salary for her work. WJ is the chief executive officer of Doctorpresso and owns 61% of the company’s equity. HK and SL receive salaries from VOLTWIN but do not own any company equity. The Mind Station app was created by DS in collaboration with WJ and YC with the aim of providing mental health care based on writing emotion diaries. The app is owned by Doctorpresso. The data collected through this app were analyzed by VOLTWIN, a company specializing in data analysis. It is important to note that the 4 coauthors, aside from the first and corresponding author, DS, did not influence the study’s design or analysis. YC contributed to the design of the Mind Station app, and WJ worked on community outreach projects based on the Mind Station app. VOLTWIN was subcontracted to perform artificial intelligence–related analysis, and the analysis plan was carried out in consultation with DS, ensuring that the results were not affected by the company. This work was also supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute funded by the Ministry of Health & Welfare, Republic of Korea (grant HI23C0035). The funding source had no role in the study design, data collection, analysis, or interpretation or in the decision to submit the manuscript for publication.
Figures




Similar articles
-
Technological aids for the rehabilitation of memory and executive functioning in children and adolescents with acquired brain injury.Cochrane Database Syst Rev. 2016 Jul 1;7(7):CD011020. doi: 10.1002/14651858.CD011020.pub2. Cochrane Database Syst Rev. 2016. PMID: 27364851 Free PMC article.
-
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5. Clin Orthop Relat Res. 2025. PMID: 39915110
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study.JMIR Form Res. 2024 Oct 24;8:e58418. doi: 10.2196/58418. JMIR Form Res. 2024. PMID: 39447159 Free PMC article.
Cited by
-
The Applications of Large Language Models in Mental Health: Scoping Review.J Med Internet Res. 2025 May 5;27:e69284. doi: 10.2196/69284. J Med Internet Res. 2025. PMID: 40324177 Free PMC article.
-
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062. J Med Internet Res. 2025. PMID: 40489764 Free PMC article.
-
The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.JMIR Ment Health. 2025 Jun 27;12:e70610. doi: 10.2196/70610. JMIR Ment Health. 2025. PMID: 40577783 Free PMC article. Review.
-
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.JMIR Med Inform. 2025 Apr 9;13:e67706. doi: 10.2196/67706. JMIR Med Inform. 2025. PMID: 40203306 Free PMC article.
-
Research progress and implications of the application of large language model in shared decision-making in China's healthcare field.Front Public Health. 2025 Jul 10;13:1605212. doi: 10.3389/fpubh.2025.1605212. eCollection 2025. Front Public Health. 2025. PMID: 40709042 Free PMC article. Review.
References
-
- Ferrari AJ, Charlson FJ, Norman RE, Flaxman AD, Patten SB, Vos T, Whiteford HA. The epidemiological modelling of major depressive disorder: application for the Global Burden of Disease Study 2010. PLoS One. 2013;8(7):e69637. doi: 10.1371/journal.pone.0069637. https://dx.plos.org/10.1371/journal.pone.0069637 PONE-D-13-10867 - DOI - DOI - PMC - PubMed
-
- Ferrari AJ, Somerville AJ, Baxter AJ, Norman R, Patten SB, Vos T, Whiteford HA. Global variation in the prevalence and incidence of major depressive disorder: a systematic review of the epidemiological literature. Psychol Med. 2013 Mar;43(3):471–81. doi: 10.1017/S0033291712001511.S0033291712001511 - DOI - PubMed
-
- Ferrari AJ, Charlson FJ, Norman RE, Patten SB, Freedman G, Murray CJ, Vos T, Whiteford HA. Burden of depressive disorders by country, sex, age, and year: findings from the Global Burden of Disease Study 2010. PLoS Med. 2013 Nov;10(11):e1001547. doi: 10.1371/journal.pmed.1001547. https://dx.plos.org/10.1371/journal.pmed.1001547 PMEDICINE-D-13-01260 - DOI - DOI - PMC - PubMed
-
- Liu Q, He H, Yang J, Feng X, Zhao F, Lyu J. Changes in the global burden of depression from 1990 to 2017: findings from the Global Burden of Disease study. J Psychiatr Res. 2020 Jul;126:134–140. doi: 10.1016/j.jpsychires.2019.08.002. https://linkinghub.elsevier.com/retrieve/pii/S0022-3956(19)30738-1 S0022-3956(19)30738-1 - DOI - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Research Materials