Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 30;7(7):ofaa258.
doi: 10.1093/ofid/ofaa258. eCollection 2020 Jul.

An "Infodemic": Leveraging High-Volume Twitter Data to Understand Early Public Sentiment for the Coronavirus Disease 2019 Outbreak

Affiliations

An "Infodemic": Leveraging High-Volume Twitter Data to Understand Early Public Sentiment for the Coronavirus Disease 2019 Outbreak

Richard J Medford et al. Open Forum Infect Dis. .

Abstract

Background: Twitter has been used to track trends and disseminate health information during viral epidemics. On January 21, 2020, the Centers for Disease Control and Prevention activated its Emergency Operations Center and the World Health Organization released its first situation report about coronavirus disease 2019 (COVID-19), sparking significant media attention. How Twitter content and sentiment evolved in the early stages of the COVID-19 pandemic has not been described.

Methods: We extracted tweets matching hashtags related to COVID-19 from January 14 to 28, 2020 using Twitter's application programming interface. We measured themes and frequency of keywords related to infection prevention practices. We performed a sentiment analysis to identify the sentiment polarity and predominant emotions in tweets and conducted topic modeling to identify and explore discussion topics over time. We compared sentiment, emotion, and topics among the most popular tweets, defined by the number of retweets.

Results: We evaluated 126 049 tweets from 53 196 unique users. The hourly number of COVID-19-related tweets starkly increased from January 21, 2020 onward. Approximately half (49.5%) of all tweets expressed fear and approximately 30% expressed surprise. In the full cohort, the economic and political impact of COVID-19 was the most commonly discussed topic. When focusing on the most retweeted tweets, the incidence of fear decreased and topics focused on quarantine efforts, the outbreak and its transmission, as well as prevention.

Conclusions: Twitter is a rich medium that can be leveraged to understand public sentiment in real-time and potentially target individualized public health messages based on user interest and emotion.

Keywords: COVID-19; SARS-CoV-2; pandemic; sentiment analysis; topic modeling.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Number of coronavirus disease 2019 (COVID-19)-related tweets (left y-axis) and number of newly confirmed coronavirus cases (right y-axis) over time. CDC, Centers for Disease Control and Prevention; WHO, World Health Organization.
Figure 2.
Figure 2.
Word cloud showing the top 300 words used in tweets related to coronavirus disease 2019 (COVID-19).
Figure 3.
Figure 3.
Daily number of tweets related to infection prevention and its subgroups of isolation/quarantine, masks, and hand hygiene.
Figure 4.
Figure 4.
Analysis of (A) tweet emotions (anger, disgust, fear, joy, sadness and surprise) and (B) sentiment polarity over time.
Figure 5.
Figure 5.
(A) The 15 terms (in order of weighting) that contributed to each abstract topic with their potential theme labels. The topics are ordered by frequency. Colors for each topic correspond to those in B. Topic labels were assigned by the authors. (B) A t-distributed Stochastic Neighbor Embedding (t-SNE) graph (17) (which embeds high-dimensional data into a 2-dimensional space where similar tweets are grouped together) that visualizes the topics in A as labeled by color and how they change over time. The full interactive visualization is available at https://ssaleh2.github.io/Early_2019nCoV_Twitter_Analysis/; please note the visualization is slow to load. Each node represents an individual tweet, and only tweets posted through the day highlighted on the slider are shown in the foreground, whereas all tweets in the study period are shown in the background. Hovering over a node will show the tweet text and the day it was posted. Depicted here are 3 screenshots for January 14, 2020 (day 0), January 20, 2020 (day 6), and January 27, 2020 (day 13).

References

    1. Scanfeld D, Scanfeld V, Larson EL. Dissemination of health information through social networks: Twitter and antibiotics. Am J Infect Control 2010; 38:182–8. - PMC - PubMed
    1. Chorianopoulos K, Talvis K. Flutrack.org: open-source and linked data for epidemiology. Health Informatics J 2016; 22:962–74. - PubMed
    1. Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS One 2011; 6:e19467. - PMC - PubMed
    1. Househ M. Communicating Ebola through social media and electronic news media outlets: a cross-sectional study. Health Informatics J 2016; 22:470–8. - PubMed
    1. Odlum M, Yoon S. What can we learn about the Ebola outbreak from tweets? Am J Infect Control 2015; 43:563–71. - PMC - PubMed