Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun 19;3(2):e38.
doi: 10.2196/publichealth.7157.

What Are People Tweeting About Zika? An Exploratory Study Concerning Its Symptoms, Treatment, Transmission, and Prevention

Affiliations

What Are People Tweeting About Zika? An Exploratory Study Concerning Its Symptoms, Treatment, Transmission, and Prevention

Michele Miller et al. JMIR Public Health Surveill. .

Abstract

Background: In order to harness what people are tweeting about Zika, there needs to be a computational framework that leverages machine learning techniques to recognize relevant Zika tweets and, further, categorize these into disease-specific categories to address specific societal concerns related to the prevention, transmission, symptoms, and treatment of Zika virus.

Objective: The purpose of this study was to determine the relevancy of the tweets and what people were tweeting about the 4 disease characteristics of Zika: symptoms, transmission, prevention, and treatment.

Methods: A combination of natural language processing and machine learning techniques was used to determine what people were tweeting about Zika. Specifically, a two-stage classifier system was built to find relevant tweets about Zika, and then the tweets were categorized into 4 disease categories. Tweets in each disease category were then examined using latent Dirichlet allocation (LDA) to determine the 5 main tweet topics for each disease characteristic.

Results: Over 4 months, 1,234,605 tweets were collected. The number of tweets by males and females was similar (28.47% [351,453/1,234,605] and 23.02% [284,207/1,234,605], respectively). The classifier performed well on the training and test data for relevancy (F1 score=0.87 and 0.99, respectively) and disease characteristics (F1 score=0.79 and 0.90, respectively). Five topics for each category were found and discussed, with a focus on the symptoms category.

Conclusions: We demonstrate how categories of discussion on Twitter about an epidemic can be discovered so that public health officials can understand specific societal concerns within the disease-specific categories. Our two-stage classifier was able to identify relevant tweets to enable more specific analysis, including the specific aspects of Zika that were being discussed as well as misinformation being expressed. Future studies can capture sentiments and opinions on epidemic outbreaks like Zika virus in real time, which will likely inform efforts to educate the public at large.

Keywords: epidemiology; machine learning; social media; viruses.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Block diagram of the pragmatic function-oriented content retrieval using a hierarchical supervised classification technique, followed by deeper analysis for characteristics of disease content.
Figure 2
Figure 2
Polarity and proportion of tweets divided in the gender categories.
Figure 3
Figure 3
Number of tweets in each disease category after classifying all tweets (1.2 million tweets) using the best classification model multinomial Naive Bayes (discussed in the Classification and Performance Using 10-fold Cross-Validation section).
Figure 4
Figure 4
Number of tweets from the labeled dataset for each of the 4 categories of disease characteristics.
Figure 5
Figure 5
Prevention, symptoms, transmission, and treatment perplexity measure plots.
Figure 6
Figure 6
A 2-dimensional principal components plot of topics discussed pertaining to Zika symptoms.

References

    1. Nanlong M. Allafrica. 2016. [2016-12-12]. Nigeriabola - two die after drinking salt water in Jos http://allafrica.com/stories/201408111640.html 6mhjzekwJ.
    1. Centers for Disease Control and Prevention (CDC) CDC. 2016. [2017-06-06]. Transcript for CDC telebriefing: Zika summit press conference 2016 https://www.cdc.gov/media/releases/2016/t0404-zika-summit.html 6r1Oj46i9. - PubMed
    1. Berg N. Greenbiz. 2013. How citizens have become sensors https://www.greenbiz.com/news/2013/03/20/how-citizens-have-become-sensors 6mhnJVcJ8.
    1. Tran T, Lee K. Understanding citizen reactions and Ebola-related information propagation on social media. International Conference on Advances in Social Networks Analysis and Mining; August 18, 2016; San Francisco. 2016.
    1. Purohit H, Banerjee T, Hampton A, Shalin V, Bhandutia N, Sheth A. Arxiv. 2016. [2017-06-10]. Gender-based violence in 140 characters or fewer: a #BigData case study of Twitter https://arxiv.org/abs/1503.02086 6r7WfwX7K.

LinkOut - more resources