Natural Language Processing for Depression Prediction on Sina Weibo: Method Study and Analysis
- PMID: 39233477
- PMCID: PMC11391090
- DOI: 10.2196/58259
Natural Language Processing for Depression Prediction on Sina Weibo: Method Study and Analysis
Abstract
Background: Depression represents a pressing global public health concern, impacting the physical and mental well-being of hundreds of millions worldwide. Notwithstanding advances in clinical practice, an alarming number of individuals at risk for depression continue to face significant barriers to timely diagnosis and effective treatment, thereby exacerbating a burgeoning social health crisis.
Objective: This study seeks to develop a novel online depression risk detection method using natural language processing technology to identify individuals at risk of depression on the Chinese social media platform Sina Weibo.
Methods: First, we collected approximately 527,333 posts publicly shared over 1 year from 1600 individuals with depression and 1600 individuals without depression on the Sina Weibo platform. We then developed a hierarchical transformer network for learning user-level semantic representations, which consists of 3 primary components: a word-level encoder, a post-level encoder, and a semantic aggregation encoder. The word-level encoder learns semantic embeddings from individual posts, while the post-level encoder explores features in user post sequences. The semantic aggregation encoder aggregates post sequence semantics to generate a user-level semantic representation that can be classified as depressed or nondepressed. Next, a classifier is employed to predict the risk of depression. Finally, we conducted statistical and linguistic analyses of the post content from individuals with and without depression using the Chinese Linguistic Inquiry and Word Count.
Results: We divided the original data set into training, validation, and test sets. The training set consisted of 1000 individuals with depression and 1000 individuals without depression. Similarly, each validation and test set comprised 600 users, with 300 individuals from both cohorts (depression and nondepression). Our method achieved an accuracy of 84.62%, precision of 84.43%, recall of 84.50%, and F1-score of 84.32% on the test set without employing sampling techniques. However, by applying our proposed retrieval-based sampling strategy, we observed significant improvements in performance: an accuracy of 95.46%, precision of 95.30%, recall of 95.70%, and F1-score of 95.43%. These outstanding results clearly demonstrate the effectiveness and superiority of our proposed depression risk detection model and retrieval-based sampling technique. This breakthrough provides new insights for large-scale depression detection through social media. Through language behavior analysis, we discovered that individuals with depression are more likely to use negation words (the value of "swear" is 0.001253). This may indicate the presence of negative emotions, rejection, doubt, disagreement, or aversion in individuals with depression. Additionally, our analysis revealed that individuals with depression tend to use negative emotional vocabulary in their expressions ("NegEmo": 0.022306; "Anx": 0.003829; "Anger": 0.004327; "Sad": 0.005740), which may reflect their internal negative emotions and psychological state. This frequent use of negative vocabulary could be a way for individuals with depression to express negative feelings toward life, themselves, or their surrounding environment.
Conclusions: The research results indicate the feasibility and effectiveness of using deep learning methods to detect the risk of depression. These findings provide insights into the potential for large-scale, automated, and noninvasive prediction of depression among online social media users.
Keywords: Sina Weibo; deep learning; depression; linguistic analysis; mental health; mood analysis; natural language processing; risk prediction; social media; statistical analysis.
© Zhenwen Zhang, Jianghong Zhu, Zhihua Guo, Yu Zhang, Zepeng Li, Bin Hu. Originally published in JMIR Mental Health (https://mental.jmir.org).
Conflict of interest statement
Figures








Similar articles
-
Psychological and Behavioral Insights From Social Media Users: Natural Language Processing-Based Quantitative Study on Mental Well-Being.JMIR Form Res. 2025 Jan 20;9:e60286. doi: 10.2196/60286. JMIR Form Res. 2025. PMID: 39832365 Free PMC article.
-
Concerns Expressed by Chinese Social Media Users During the COVID-19 Pandemic: Content Analysis of Sina Weibo Microblogging Data.J Med Internet Res. 2020 Nov 26;22(11):e22152. doi: 10.2196/22152. J Med Internet Res. 2020. PMID: 33151894 Free PMC article.
-
Public Attitudes Toward Anxiety Disorder on Sina Weibo: Content Analysis.J Med Internet Res. 2023 Apr 4;25:e45777. doi: 10.2196/45777. J Med Internet Res. 2023. PMID: 37014691 Free PMC article.
-
Machine learning models to detect anxiety and depression through social media: A scoping review.Comput Methods Programs Biomed Update. 2022;2:100066. doi: 10.1016/j.cmpbup.2022.100066. Epub 2022 Sep 9. Comput Methods Programs Biomed Update. 2022. PMID: 36105318 Free PMC article.
-
Utilizing natural language processing for precision prevention of mental health disorders among youth: A systematic review.Comput Biol Med. 2025 Apr;188:109859. doi: 10.1016/j.compbiomed.2025.109859. Epub 2025 Feb 21. Comput Biol Med. 2025. PMID: 39986200
Cited by
-
Sentiment analysis in public health: a systematic review of the current state, challenges, and future directions.Front Public Health. 2025 Jun 20;13:1609749. doi: 10.3389/fpubh.2025.1609749. eCollection 2025. Front Public Health. 2025. PMID: 40620557 Free PMC article.
-
Developing a suicide risk prediction model for hospitalized adolescents with depression in China.Front Psychiatry. 2025 May 2;16:1532828. doi: 10.3389/fpsyt.2025.1532828. eCollection 2025. Front Psychiatry. 2025. PMID: 40386115 Free PMC article.
References
-
- Evans-Lacko S, Aguilar-Gaxiola S, Al-Hamzawi A, et al. Socio-economic variations in the mental health treatment gap for people with anxiety, mood, and substance use disorders: results from the WHO World Mental Health (WMH) surveys. Psychol Med. 2018 Jul;48(9):1560–1571. doi: 10.1017/S0033291717003336. doi. Medline. - DOI - PMC - PubMed
-
- Fox AB, Smith BN, Vogt D. How and when does mental illness stigma impact treatment seeking? longitudinal examination of relationships between anticipated and internalized stigma, symptom severity, and mental health service use. Psychiatry Res. 2018 Oct;268:15–20. doi: 10.1016/j.psychres.2018.06.036. doi. Medline. - DOI - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical