Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul:3:433-458.
doi: 10.1146/annurev-biodatasci-030320-040844. Epub 2020 May 4.

Mining Social Media Data for Biomedical Signals and Health-Related Behavior

Affiliations

Mining Social Media Data for Biomedical Signals and Health-Related Behavior

Rion Brattig Correia et al. Annu Rev Biomed Data Sci. 2020 Jul.

Abstract

Social media data have been increasingly used to study biomedical and health-related phenomena. From cohort-level discussions of a condition to population-level analyses of sentiment, social media have provided scientists with unprecedented amounts of data to study human behavior associated with a variety of health conditions and medical treatments. Here we review recent work in mining social media for biomedical, epidemiological, and social phenomena information relevant to the multilevel complexity of human health. We pay particular attention to topics where social media data analysis has shown the most progress, including pharmacovigilance and sentiment analysis, especially for mental health. We also discuss a variety of innovative uses of social media data for health-related applications as well as important limitations of social media data access and use.

Keywords: biomedicine; healthcare; pharmacovigilance; sentiment analysis; social media.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Selected sample of social media posts depicting known drug and symptom mentions, (a) Instagram photos depicting a variety of drugs. (b,c) Captions of Instagram posts. The two captions in panel b were posted two days apart by the same user, showing in the second post a possible side effect from a drug administration mentioned in the first post. (d) Twitter posts containing drugs known to be abused. (e) Epilepsy Foundation forum post and comments from users asking questions and sharing experiences over drug dosage (Keppra). For all examples, usernames, number of likes, and dates were omitted for privacy, and some content was modified for clarity and to maintain user anonymity. Terms of pharmacovigilance interest, including drug names, natural products, and symptoms, are highlighted in yellow using dictionaries developed for this problem (3, 65).
Figure 2
Figure 2
(a) An example tweet with its average ANEW (115) scores for arousal, dominance, and valence dimensions. Only words found in the ANEW dictionary were matched to their score. (b) A mood histogram time series showing the per-day distribution of ANEW valence scores for a cohort of Twitter users who self-reported being diagnosed with depression (116). (c) A mean-centered time series of ANEW valence scores for a cohort of Twitter users who stated that they were having a strong emotion on Twitter. Scores are shown for 1-min increments, smoothed by a 10-min rolling average, used to study (14) the effects of affect labeling on Twitter, i.e., the act of putting one’s feeling into words, in this case by stating “I feel ” in a tweet followed by a set of words that denote a strong emotion. Time t = 0 h (red dashed line) is the time at which the affect labeling tweet was posted for each person in the cohort. (d) Average LIWC (117) functional word count of the Facebook posts of a subject from a cohort of patients who died of SUDEP whose behavior on Facebook was studied after their death. This young patient, like several others in the cohort, showed an increase in functional words before SUDEP. Functional words are pronouns, prepositions, articles, conjunctions, auxiliary verbs, and a few other categories understood to indicate emotional states and other individual differences. Abbreviations: 50p, 50th percentile; ANEW, Affective Norms for English Words; CI, confidence interval; LIWC, Linguistic Inquiry and Word Count; SUDEP, sudden unexpected death in epilepsy.

References

    1. Correia RB, de Araújo Kohler LP, Mattos MM, Rocha LM. 2019. City-wide electronic health records reveal gender and age biases in administration of known drug–drug interactions. NPJ Digit. Med 2:74. - PMC - PubMed
    1. Christakis NA, Fowler JH. 2010. Social network sensors for early detection of contagious outbreaks. PLOS ONE 5:e12948. - PMC - PubMed
    1. Correia RB, Li L, Rocha LM. 2016. Monitoring potential drug interactions and reactions via network analysis of Instagram user timelines. Pac. Symp. Biocomput 21:492–503 - PMC - PubMed
    1. Choudhury MD, Counts S, Horvitz E. 2013. Social media as a measurement tool of depression in populations. In Proceedings of the 5th Annual ACM Web Science Conference, pp. 47–56. New York: Assoc. Comput. Mach.
    1. Lazer D, Pentland A, Adamic L, Aral S, Barabási AL, et al. 2009. Computational social science. Science 323:721–23 - PMC - PubMed

LinkOut - more resources