. 2024 Sep 18:26:e54617.

doi: 10.2196/54617.

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study

Daun Shin^{1

2}, Hyoseung Kim³, Seunghwan Lee³, Younhee Cho^{2

4}, Whanbo Jung²

Affiliations

¹ Department of Psychiatry, Anam Hospital, Korea University, Seoul, Republic of Korea.
² Doctorpresso, Seoul, Republic of Korea.
³ VOLTWIN, Seoul, Republic of Korea.
⁴ Department of Design, Seoul National University, Seoul, Republic of Korea.

PMID: 39292502
PMCID: PMC11447422
DOI: 10.2196/54617

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study

Daun Shin et al. J Med Internet Res. 2024.

. 2024 Sep 18:26:e54617.

doi: 10.2196/54617.

Authors

Daun Shin^{1

2}, Hyoseung Kim³, Seunghwan Lee³, Younhee Cho^{2

4}, Whanbo Jung²

Affiliations

¹ Department of Psychiatry, Anam Hospital, Korea University, Seoul, Republic of Korea.
² Doctorpresso, Seoul, Republic of Korea.
³ VOLTWIN, Seoul, Republic of Korea.
⁴ Department of Design, Seoul National University, Seoul, Republic of Korea.

PMID: 39292502
PMCID: PMC11447422
DOI: 10.2196/54617

Erratum in

Correction: Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.
Shin D, Kim H, Lee S, Cho Y, Jung W. Shin D, et al. J Med Internet Res. 2025 Jul 8;27:e79198. doi: 10.2196/79198. J Med Internet Res. 2025. PMID: 40627851 Free PMC article.

Abstract

Background: Depressive disorders have substantial global implications, leading to various social consequences, including decreased occupational productivity and a high disability burden. Early detection and intervention for clinically significant depression have gained attention; however, the existing depression screening tools, such as the Center for Epidemiologic Studies Depression Scale, have limitations in objectivity and accuracy. Therefore, researchers are identifying objective indicators of depression, including image analysis, blood biomarkers, and ecological momentary assessments (EMAs). Among EMAs, user-generated text data, particularly from diary writing, have emerged as a clinically significant and analyzable source for detecting or diagnosing depression, leveraging advancements in large language models such as ChatGPT.

Objective: We aimed to detect depression based on user-generated diary text through an emotional diary writing app using a large language model (LLM). We aimed to validate the value of the semistructured diary text data as an EMA data source.

Methods: Participants were assessed for depression using the Patient Health Questionnaire and suicide risk was evaluated using the Beck Scale for Suicide Ideation before starting and after completing the 2-week diary writing period. The text data from the daily diaries were also used in the analysis. The performance of leading LLMs, such as ChatGPT with GPT-3.5 and GPT-4, was assessed with and without GPT-3.5 fine-tuning on the training data set. The model performance comparison involved the use of chain-of-thought and zero-shot prompting to analyze the text structure and content.

Results: We used 428 diaries from 91 participants; GPT-3.5 fine-tuning demonstrated superior performance in depression detection, achieving an accuracy of 0.902 and a specificity of 0.955. However, the balanced accuracy was the highest (0.844) for GPT-3.5 without fine-tuning and prompt techniques; it displayed a recall of 0.929.

Conclusions: Both GPT-3.5 and GPT-4.0 demonstrated relatively reasonable performance in recognizing the risk of depression based on diaries. Our findings highlight the potential clinical usefulness of user-generated text data for detecting depression. In addition to measurable indicators, such as step count and physical activity, future research should increasingly emphasize qualitative digital expression.

Keywords: artificial intelligence; depression; digital health technology; screening; text data.

©Daun Shin, Hyoseung Kim, Seunghwan Lee, Younhee Cho, Whanbo Jung. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.09.2024.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: DS is the chief medical officer at Doctorpresso. While DS does not receive a salary from the company, she owns 25% of its equity. YC is a project manager at Doctorpresso and receives a salary for her work. WJ is the chief executive officer of Doctorpresso and owns 61% of the company’s equity. HK and SL receive salaries from VOLTWIN but do not own any company equity. The Mind Station app was created by DS in collaboration with WJ and YC with the aim of providing mental health care based on writing emotion diaries. The app is owned by Doctorpresso. The data collected through this app were analyzed by VOLTWIN, a company specializing in data analysis. It is important to note that the 4 coauthors, aside from the first and corresponding author, DS, did not influence the study’s design or analysis. YC contributed to the design of the Mind Station app, and WJ worked on community outreach projects based on the Mind Station app. VOLTWIN was subcontracted to perform artificial intelligence–related analysis, and the analysis plan was carried out in consultation with DS, ensuring that the results were not affected by the company. This work was also supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute funded by the Ministry of Health & Welfare, Republic of Korea (grant HI23C0035). The funding source had no role in the study design, data collection, analysis, or interpretation or in the decision to submit the manuscript for publication.

Figures

**Figure 1**
Application process. AI: artificial intelligence.

**Figure 2**
Chain-of-thought (CoT) prompting. Differences with standard prompting are shown in blue.

**Figure 3**
Confusion matrix for GPT 3.5 models. CoT: chain of thought; LLM: large language model.

**Figure 4**
Confusion matrix for other artificial intelligence models. CoT: chain of thought.

See this image and copyright information in PMC

Cited by

The Applications of Large Language Models in Mental Health: Scoping Review.
Jin Y, Liu J, Li P, Wang B, Yan Y, Zhang H, Ni C, Wang J, Li Y, Bu Y, Wang Y. Jin Y, et al. J Med Internet Res. 2025 May 5;27:e69284. doi: 10.2196/69284. J Med Internet Res. 2025. PMID: 40324177 Free PMC article.
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.
Su H, Sun Y, Li R, Zhang A, Yang Y, Xiao F, Duan Z, Chen J, Hu Q, Yang T, Xu B, Zhang Q, Zhao J, Li Y, Li H. Su H, et al. J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062. J Med Internet Res. 2025. PMID: 40489764 Free PMC article.
The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.
Wang X, Zhou Y, Zhou G. Wang X, et al. JMIR Ment Health. 2025 Jun 27;12:e70610. doi: 10.2196/70610. JMIR Ment Health. 2025. PMID: 40577783 Free PMC article. Review.
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.
Mahyoub M, Dougherty K, Shukla A. Mahyoub M, et al. JMIR Med Inform. 2025 Apr 9;13:e67706. doi: 10.2196/67706. JMIR Med Inform. 2025. PMID: 40203306 Free PMC article.
Research progress and implications of the application of large language model in shared decision-making in China's healthcare field.
Li X, Chen S, Meng M, Wang Z, Jiang H, Hao Y. Li X, et al. Front Public Health. 2025 Jul 10;13:1605212. doi: 10.3389/fpubh.2025.1605212. eCollection 2025. Front Public Health. 2025. PMID: 40709042 Free PMC article. Review.

References

1. Ferrari AJ, Charlson FJ, Norman RE, Flaxman AD, Patten SB, Vos T, Whiteford HA. The epidemiological modelling of major depressive disorder: application for the Global Burden of Disease Study 2010. PLoS One. 2013;8(7):e69637. doi: 10.1371/journal.pone.0069637. https://dx.plos.org/10.1371/journal.pone.0069637 PONE-D-13-10867 - DOI - DOI - PMC - PubMed
1. Ferrari AJ, Somerville AJ, Baxter AJ, Norman R, Patten SB, Vos T, Whiteford HA. Global variation in the prevalence and incidence of major depressive disorder: a systematic review of the epidemiological literature. Psychol Med. 2013 Mar;43(3):471–81. doi: 10.1017/S0033291712001511.S0033291712001511 - DOI - PubMed
1. Moreno-Agostino D, Wu Y, Daskalopoulou C, Hasan MT, Huisman M, Prina M. Global trends in the prevalence and incidence of depression:a systematic review and meta-analysis. J Affect Disord. 2021 Feb 15;281:235–243. doi: 10.1016/j.jad.2020.12.035.S0165-0327(20)33124-4 - DOI - PubMed
1. Ferrari AJ, Charlson FJ, Norman RE, Patten SB, Freedman G, Murray CJ, Vos T, Whiteford HA. Burden of depressive disorders by country, sex, age, and year: findings from the Global Burden of Disease Study 2010. PLoS Med. 2013 Nov;10(11):e1001547. doi: 10.1371/journal.pmed.1001547. https://dx.plos.org/10.1371/journal.pmed.1001547 PMEDICINE-D-13-01260 - DOI - DOI - PMC - PubMed
1. Liu Q, He H, Yang J, Feng X, Zhao F, Lyu J. Changes in the global burden of depression from 1990 to 2017: findings from the Global Burden of Disease study. J Psychiatr Res. 2020 Jul;126:134–140. doi: 10.1016/j.jpsychires.2019.08.002. https://linkinghub.elsevier.com/retrieve/pii/S0022-3956(19)30738-1 S0022-3956(19)30738-1 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study

Affiliations

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Erratum in

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical

Research Materials