Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 2:58:111252.
doi: 10.1016/j.dib.2024.111252. eCollection 2025 Feb.

CoWIN twitter dataset: A comprehensive collection of public discourse on India's COVID-19 vaccination platform

Affiliations

CoWIN twitter dataset: A comprehensive collection of public discourse on India's COVID-19 vaccination platform

Shubham Mittal et al. Data Brief. .

Abstract

The CoWIN Twitter Dataset offers a wide-ranging collection of public opinions on India's COVID-19 vaccination platform CoWIN. The raw dataset has 635,000 tweets that mention "cowin," collected over the period of January to December 2021. The dataset was extracted by employing the Twitter Academic API. It addition to the raw data, it also included a cleaned and processed set of 419,409 English tweets, and a labeled subset with sentiment analysis. The raw data file has tweet details like ID, text, timestamp, user ID, and language. The processed dataset is devoid of URLs and hashtags and other noise, and also adds month and category groupings. Finally,the labelled dataset gives sentiment classifications of positive or negative the relevant tweets. This dataset enables researchers to analyse themes and sentiments related to India's vaccination administration. It can help policymakers gain insights around issues related to large-scale health initiatives and digital health systems. The mix of languages in the data also makes it useful for language processing research.

Keywords: COVID-19; CoWIN; Digital health; Health informatics; India; Sentiment analysis; Social media analytics; Twitter data.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1
Language distribution of CoWin tweets.
Fig. 2:
Fig. 2
Distribution of tweets by month.
Fig. 3:
Fig. 3
Count of positive and negative sentiments by month.
Fig. 4:
Fig. 4
Workflow for the data curation and processing.

Similar articles

References

    1. Kumar V.M., Pandi-Perumal S.R., Trakht I., Thyagarajan S.P. Strategy for COVID-19 vaccination in India: the country with the second highest population and number of cases. npj Vaccines. 2021;6(1):60. - PMC - PubMed
    1. Lampert J., Lampert C.H. 2021 IEEE International Conference on Big Data (Big Data) 2021. Overcoming rare-language discrimination in multi-lingual sentiment analysis. - DOI
    1. Pota M., Ventura M., Fujita H., Esposito M. Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets. Expert Syst. Appl. 2021;181
    1. Tessore J.P., Esnaola L., Russo C., Baldassarri S. Proceedings of the XX International Conference on Human Computer Interaction. 2019. Comparative analysis of preprocessing tasks over social media texts in Spanish.
    1. Ritchie, H., Mathieu, E., Rodés-Guirao, L., Appel, C., Giattino, C., Ortiz-Ospina, E., Hasell, J., Macdonald, B., Beltekian, D., & Roser, M. (2023). COVID-19 vaccinations. Our World in Data. Retrieved July 17, 2024, from https://ourworldindata.org/covid-vaccinations?country=IND∼GBR∼USA∼OWID_WRL

LinkOut - more resources