Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 13;21(6):e12876.
doi: 10.2196/12876.

Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations

Affiliations

Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations

Vasiliki Foufi et al. J Med Internet Res. .

Abstract

Background: Social media platforms constitute a rich data source for natural language processing tasks such as named entity recognition, relation extraction, and sentiment analysis. In particular, social media platforms about health provide a different insight into patient's experiences with diseases and treatment than those found in the scientific literature.

Objective: This paper aimed to report a study of entities related to chronic diseases and their relation in user-generated text posts. The major focus of our research is the study of biomedical entities found in health social media platforms and their relations and the way people suffering from chronic diseases express themselves.

Methods: We collected a corpus of 17,624 text posts from disease-specific subreddits of the social news and discussion website Reddit. For entity and relation extraction from this corpus, we employed the PKDE4J tool developed by Song et al (2015). PKDE4J is a text mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework.

Results: Using PKDE4J, we extracted 2 types of entities and relations: biomedical entities and relations and subject-predicate-object entity relations. In total, 82,138 entities and 30,341 relation pairs were extracted from the Reddit dataset. The most highly mentioned entities were those related to oncological disease (2884 occurrences of cancer) and asthma (2180 occurrences). The relation pair anatomy-disease was the most frequent (5550 occurrences), the highest frequent entities in this pair being cancer and lymph. The manual validation of the extracted entities showed a very good performance of the system at the entity extraction task (3682/5151, 71.48% extracted entities were correctly labeled).

Conclusions: This study showed that people are eager to share their personal experience with chronic diseases on social media platforms despite possible privacy and security issues. The results reported in this paper are promising and demonstrate the need for more in-depth studies on the way patients with chronic diseases express themselves on social media platforms.

Keywords: chronic disease; data mining; social media.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: CL is editor-in-chief for JMIR Medical Informatics.

Figures

Figure 1
Figure 1
The workflow of the PKDE4J text mining system.
Figure 2
Figure 2
Biomedical entity network.
Figure 3
Figure 3
Subject and object entity network.

Similar articles

Cited by

References

    1. Denecke K. Health Web Science: Social Media Data for Healthcare. New York: Springer International Publishing; 2015.
    1. Patel R, Chang T, Greysen SR, Chopra V. Social media use in chronic disease: a systematic review and novel taxonomy. Am J Med. 2015 Dec;128(12):1335–50. doi: 10.1016/j.amjmed.2015.06.015.S0002-9343(15)00565-3 - DOI - PubMed
    1. ReferralMD. 2017. [2019-06-03]. 30 Facts & Stats on Social Media and Healthcare https://getreferralmd.com/2017/01/30-facts-statistics-on-social-media-an...
    1. Pew Research Center. [2019-06-03]. Chronic Disease and the Internet https://www.pewinternet.org/2010/03/24/chronic-disease-and-the-internet/
    1. Moorhead S, Hazlett D, Harrison L, Carroll J, Irwin A, Hoving C. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J Med Internet Res. 2013 Apr 23;:e85. doi: 10.2196/jmir.1933. - DOI - PMC - PubMed

Publication types

LinkOut - more resources