Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021;51(5):2790-2804.
doi: 10.1007/s10489-020-02029-z. Epub 2020 Nov 6.

Design and analysis of a large-scale COVID-19 tweets dataset

Affiliations

Design and analysis of a large-scale COVID-19 tweets dataset

Rabindra Lamsal. Appl Intell (Dordr). 2021.

Abstract

As of July 17, 2020, more than thirteen million people have been diagnosed with the Novel Coronavirus (COVID-19), and half a million people have already lost their lives due to this infectious disease. The World Health Organization declared the COVID-19 outbreak as a pandemic on March 11, 2020. Since then, social media platforms have experienced an exponential rise in the content related to the pandemic. In the past, Twitter data have been observed to be indispensable in the extraction of situational awareness information relating to any crisis. This paper presents COV19Tweets Dataset (Lamsal 2020a), a large-scale Twitter dataset with more than 310 million COVID-19 specific English language tweets and their sentiment scores. The dataset's geo version, the GeoCOV19Tweets Dataset (Lamsal 2020b), is also presented. The paper discusses the datasets' design in detail, and the tweets in both the datasets are analyzed. The datasets are released publicly, anticipating that they would contribute to a better understanding of spatial and temporal dimensions of the public discourse related to the ongoing pandemic. As per the stats, the datasets (Lamsal 2020a, 2020b) have been accessed over 74.5k times, collectively.

Keywords: Crisis computing; Network analysis; Sentiment analysis; Social computing; Twitter data.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestsThe author declares that there is no conflict of interest.

Figures

Fig. 1
Fig. 1
Daily distribution of tweets in the COV19Tweets Dataset
Fig. 2
Fig. 2
Resource utilization graphs for the VM (24 hours)
Fig. 3
Fig. 3
Daily distribution of tweets in the GeoCOV19Tweets Dataset
Fig. 4
Fig. 4
COVID-19 sentiment trend, since April 24, 2020 to July 17, 2020
Fig. 5
Fig. 5
Network Analysis: Overview of the GeoCOV19Tweets Dataset
Fig. 6
Fig. 6
Country specific outlier hashtags detected using Network Analysis
Fig. 7
Fig. 7
Network diagram in Fig. 5 expanded by a scale factor
Fig. 8
Fig. 8
World view of COVID-19 Sentiment
Fig. 9
Fig. 9
Region-specific view of COVID-19 Sentiment (color scale for this figure is same as of Fig. 8)

Similar articles

Cited by

References

    1. Abd-Alrazaq A, Alhuwail D, Househ M, Hamdi M, Shah Z (2020) Top concerns of tweeters during the covid-19 pandemic: infoveillance study. J Med Internet Res 22(4):e19016 - PMC - PubMed
    1. Ahmed W, Vidal-Alaball J, Downing J, Seguí F.L (2020) Covid-19 and the 5g conspiracy theory: social network analysis of twitter data. J Med Internet Res 22(5):e19458 - PMC - PubMed
    1. Alqurashi S, Alhindi A, Alanazi E (2020) Large arabic twitter dataset on covid-19. arXiv:2004.04315
    1. Banda JM, Tekumalla R, Wang G, Yu J, Liu T, Ding Y, Chowell G (2020) A large-scale covid-19 twitter chatter dataset for open scientific research–an international collaboration. arXiv:2004.03688 - PMC - PubMed
    1. Bennett NC, Millard DE, Martin D (2018) Assessing twitter geocoding resolution. In: Proceedings of the 10th ACM Conference on Web Science, pp 239–243

LinkOut - more resources