Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 8:25:e42734.
doi: 10.2196/42734.

Methodologies for Monitoring Mental Health on Twitter: Systematic Review

Affiliations

Methodologies for Monitoring Mental Health on Twitter: Systematic Review

Nina H Di Cara et al. J Med Internet Res. .

Abstract

Background: The use of social media data to predict mental health outcomes has the potential to allow for the continuous monitoring of mental health and well-being and provide timely information that can supplement traditional clinical assessments. However, it is crucial that the methodologies used to create models for this purpose are of high quality from both a mental health and machine learning perspective. Twitter has been a popular choice of social media because of the accessibility of its data, but access to big data sets is not a guarantee of robust results.

Objective: This study aims to review the current methodologies used in the literature for predicting mental health outcomes from Twitter data, with a focus on the quality of the underlying mental health data and the machine learning methods used.

Methods: A systematic search was performed across 6 databases, using keywords related to mental health disorders, algorithms, and social media. In total, 2759 records were screened, of which 164 (5.94%) papers were analyzed. Information about methodologies for data acquisition, preprocessing, model creation, and validation was collected, as well as information about replicability and ethical considerations.

Results: The 164 studies reviewed used 119 primary data sets. There were an additional 8 data sets identified that were not described in enough detail to include, and 6.1% (10/164) of the papers did not describe their data sets at all. Of these 119 data sets, only 16 (13.4%) had access to ground truth data (ie, known characteristics) about the mental health disorders of social media users. The other 86.6% (103/119) of data sets collected data by searching keywords or phrases, which may not be representative of patterns of Twitter use for those with mental health disorders. The annotation of mental health disorders for classification labels was variable, and 57.1% (68/119) of the data sets had no ground truth or clinical input on this annotation. Despite being a common mental health disorder, anxiety received little attention.

Conclusions: The sharing of high-quality ground truth data sets is crucial for the development of trustworthy algorithms that have clinical and research utility. Further collaboration across disciplines and contexts is encouraged to better understand what types of predictions will be useful in supporting the management and identification of mental health disorders. A series of recommendations for researchers in this field and for the wider research community are made, with the aim of enhancing the quality and utility of future outputs.

Keywords: machine learning; mental health; mental illness; social media.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram of inclusion and exclusion figures for the literature search.
Figure 2
Figure 2
Network diagram showing which mental health disorder (pink) each study (blue) attempted to infer. Depression and suicidality were the most popular, with most studies attempting to predict a single outcome. ADHD: attention-deficit/hyperactivity disorder; BPD: borderline personality disorder; OCD: obsessive-compulsive disorder; PTSD: posttraumatic stress disorder; SAD: seasonal affective disorder.
Figure 3
Figure 3
The number of studies considering each mental health disorder by year of publication (for disorders included in >2 studies). BPD: borderline personality disorder; OCD: obsessive-compulsive disorder; PTSD: posttraumatic stress disorder.
Figure 4
Figure 4
The proportion of studies that reported each of the stages of modeling that we considered, split into those published before 2020 (n=89) and those published in 2020 or later (n=75).

References

    1. Correia RB, Wood IB, Bollen J, Rocha LM. Mining social media data for biomedical signals and health-related behavior. Annu Rev Biomed Data Sci. 2020 Jul;3:433–58. doi: 10.1146/annurev-biodatasci-030320-040844. https://europepmc.org/abstract/MED/32550337 - DOI - PMC - PubMed
    1. Loi M. The digital phenotype: a philosophical and ethical exploration. Philos Technol. 2019 Mar 15;32(1):155–71. doi: 10.1007/s13347-018-0319-1. https://link.springer.com/article/10.1007/s13347-018-0319-1 - DOI
    1. Ruths D, Pfeffer J. Social sciences. Social media for large studies of behavior. Science. 2014 Nov 28;346(6213):1063–4. doi: 10.1126/science.346.6213.1063.346/6213/1063 - DOI - PubMed
    1. Williams ML, Burnap P, Javed A, Liu H, Ozalp S. Hate in the machine: anti-black and anti-Muslimism social media posts as predictors of offline racially and religiously aggravated crime. Br J Criminol. 2019 Jul 23;60(1):93–117. doi: 10.1093/bjc/azz049. https://academic.oup.com/bjc/article/60/1/93/5537169 - DOI
    1. Alizadeh M, Weber I, Cioffi-Revilla C, Fortunato S, Macy M. Psychology and morality of political extremists: evidence from Twitter language analysis of alt-right and antifa. EPJ Data Sci. 2019 May 14;8(1):17. doi: 10.1140/epjds/s13688-019-0193-9. https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-01... - DOI

Publication types