Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 18:26:e54617.
doi: 10.2196/54617.

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study

Affiliations

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study

Daun Shin et al. J Med Internet Res. .

Erratum in

Abstract

Background: Depressive disorders have substantial global implications, leading to various social consequences, including decreased occupational productivity and a high disability burden. Early detection and intervention for clinically significant depression have gained attention; however, the existing depression screening tools, such as the Center for Epidemiologic Studies Depression Scale, have limitations in objectivity and accuracy. Therefore, researchers are identifying objective indicators of depression, including image analysis, blood biomarkers, and ecological momentary assessments (EMAs). Among EMAs, user-generated text data, particularly from diary writing, have emerged as a clinically significant and analyzable source for detecting or diagnosing depression, leveraging advancements in large language models such as ChatGPT.

Objective: We aimed to detect depression based on user-generated diary text through an emotional diary writing app using a large language model (LLM). We aimed to validate the value of the semistructured diary text data as an EMA data source.

Methods: Participants were assessed for depression using the Patient Health Questionnaire and suicide risk was evaluated using the Beck Scale for Suicide Ideation before starting and after completing the 2-week diary writing period. The text data from the daily diaries were also used in the analysis. The performance of leading LLMs, such as ChatGPT with GPT-3.5 and GPT-4, was assessed with and without GPT-3.5 fine-tuning on the training data set. The model performance comparison involved the use of chain-of-thought and zero-shot prompting to analyze the text structure and content.

Results: We used 428 diaries from 91 participants; GPT-3.5 fine-tuning demonstrated superior performance in depression detection, achieving an accuracy of 0.902 and a specificity of 0.955. However, the balanced accuracy was the highest (0.844) for GPT-3.5 without fine-tuning and prompt techniques; it displayed a recall of 0.929.

Conclusions: Both GPT-3.5 and GPT-4.0 demonstrated relatively reasonable performance in recognizing the risk of depression based on diaries. Our findings highlight the potential clinical usefulness of user-generated text data for detecting depression. In addition to measurable indicators, such as step count and physical activity, future research should increasingly emphasize qualitative digital expression.

Keywords: artificial intelligence; depression; digital health technology; screening; text data.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: DS is the chief medical officer at Doctorpresso. While DS does not receive a salary from the company, she owns 25% of its equity. YC is a project manager at Doctorpresso and receives a salary for her work. WJ is the chief executive officer of Doctorpresso and owns 61% of the company’s equity. HK and SL receive salaries from VOLTWIN but do not own any company equity. The Mind Station app was created by DS in collaboration with WJ and YC with the aim of providing mental health care based on writing emotion diaries. The app is owned by Doctorpresso. The data collected through this app were analyzed by VOLTWIN, a company specializing in data analysis. It is important to note that the 4 coauthors, aside from the first and corresponding author, DS, did not influence the study’s design or analysis. YC contributed to the design of the Mind Station app, and WJ worked on community outreach projects based on the Mind Station app. VOLTWIN was subcontracted to perform artificial intelligence–related analysis, and the analysis plan was carried out in consultation with DS, ensuring that the results were not affected by the company. This work was also supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute funded by the Ministry of Health & Welfare, Republic of Korea (grant HI23C0035). The funding source had no role in the study design, data collection, analysis, or interpretation or in the decision to submit the manuscript for publication.

Figures

Figure 1
Figure 1
Application process. AI: artificial intelligence.
Figure 2
Figure 2
Chain-of-thought (CoT) prompting. Differences with standard prompting are shown in blue.
Figure 3
Figure 3
Confusion matrix for GPT 3.5 models. CoT: chain of thought; LLM: large language model.
Figure 4
Figure 4
Confusion matrix for other artificial intelligence models. CoT: chain of thought.

Similar articles

Cited by

References

    1. Ferrari AJ, Charlson FJ, Norman RE, Flaxman AD, Patten SB, Vos T, Whiteford HA. The epidemiological modelling of major depressive disorder: application for the Global Burden of Disease Study 2010. PLoS One. 2013;8(7):e69637. doi: 10.1371/journal.pone.0069637. https://dx.plos.org/10.1371/journal.pone.0069637 PONE-D-13-10867 - DOI - DOI - PMC - PubMed
    1. Ferrari AJ, Somerville AJ, Baxter AJ, Norman R, Patten SB, Vos T, Whiteford HA. Global variation in the prevalence and incidence of major depressive disorder: a systematic review of the epidemiological literature. Psychol Med. 2013 Mar;43(3):471–81. doi: 10.1017/S0033291712001511.S0033291712001511 - DOI - PubMed
    1. Moreno-Agostino D, Wu Y, Daskalopoulou C, Hasan MT, Huisman M, Prina M. Global trends in the prevalence and incidence of depression:a systematic review and meta-analysis. J Affect Disord. 2021 Feb 15;281:235–243. doi: 10.1016/j.jad.2020.12.035.S0165-0327(20)33124-4 - DOI - PubMed
    1. Ferrari AJ, Charlson FJ, Norman RE, Patten SB, Freedman G, Murray CJ, Vos T, Whiteford HA. Burden of depressive disorders by country, sex, age, and year: findings from the Global Burden of Disease Study 2010. PLoS Med. 2013 Nov;10(11):e1001547. doi: 10.1371/journal.pmed.1001547. https://dx.plos.org/10.1371/journal.pmed.1001547 PMEDICINE-D-13-01260 - DOI - DOI - PMC - PubMed
    1. Liu Q, He H, Yang J, Feng X, Zhao F, Lyu J. Changes in the global burden of depression from 1990 to 2017: findings from the Global Burden of Disease study. J Psychiatr Res. 2020 Jul;126:134–140. doi: 10.1016/j.jpsychires.2019.08.002. https://linkinghub.elsevier.com/retrieve/pii/S0022-3956(19)30738-1 S0022-3956(19)30738-1 - DOI - PubMed

Publication types