Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 1;18(1):282.
doi: 10.3390/ijerph18010282.

COVID-19: Detecting Government Pandemic Measures and Public Concerns from Twitter Arabic Data Using Distributed Machine Learning

Affiliations

COVID-19: Detecting Government Pandemic Measures and Public Concerns from Twitter Arabic Data Using Distributed Machine Learning

Ebtesam Alomari et al. Int J Environ Res Public Health. .

Abstract

Today's societies are connected to a level that has never been seen before. The COVID-19 pandemic has exposed the vulnerabilities of such an unprecedently connected world. As of 19 November 2020, over 56 million people have been infected with nearly 1.35 million deaths, and the numbers are growing. The state-of-the-art social media analytics for COVID-19-related studies to understand the various phenomena happening in our environment are limited and require many more studies. This paper proposes a software tool comprising a collection of unsupervised Latent Dirichlet Allocation (LDA) machine learning and other methods for the analysis of Twitter data in Arabic with the aim to detect government pandemic measures and public concerns during the COVID-19 pandemic. The tool is described in detail, including its architecture, five software components, and algorithms. Using the tool, we collect a dataset comprising 14 million tweets from the Kingdom of Saudi Arabia (KSA) for the period 1 February 2020 to 1 June 2020. We detect 15 government pandemic measures and public concerns and six macro-concerns (economic sustainability, social sustainability, etc.), and formulate their information-structural, temporal, and spatio-temporal relationships. For example, we are able to detect the timewise progression of events from the public discussions on COVID-19 cases in mid-March to the first curfew on 22 March, financial loan incentives on 22 March, the increased quarantine discussions during March-April, the discussions on the reduced mobility levels from 24 March onwards, the blood donation shortfall late March onwards, the government's 9 billion SAR (Saudi Riyal) salary incentives on 3 April, lifting the ban on five daily prayers in mosques on 26 May, and finally the return to normal government measures on 29 May 2020. These findings show the effectiveness of the Twitter media in detecting important events, government measures, public concerns, and other information in both time and space with no earlier knowledge about them.

Keywords: Arabic language; COVID-19; Triple Bottom Line (TBL); Twitter; apache spark; big data; coronavirus; distributed computing; machine learning; smart cities; smart governance; smart healthcare; social media.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The timeline of some of the detected government pandemic measures and public concerns.
Figure 2
Figure 2
The tool architecture.
Figure 3
Figure 3
Perplexity score versus the number of concerns.
Figure 4
Figure 4
Tweet intensity versus probability of concerns.
Figure 5
Figure 5
The correlation matrix of keywords.
Figure 6
Figure 6
Daily Twitter activity of government measures and public concerns (all).
Figure 7
Figure 7
Daily Twitter activity for a macro-concern (Contain the Virus).
Figure 8
Figure 8
Daily Twitter activity for a macro-concern (virus infection).
Figure 9
Figure 9
Daily Twitter activity for a public macro-concern (back to normal).
Figure 10
Figure 10
Daily Twitter activity for a public macro-concern (impact on daily life).
Figure 11
Figure 11
Daily Twitter activity for a public macro-concern (social sustainability).
Figure 12
Figure 12
Daily Twitter activity for a public macro-concern (economic sustainability).
Figure 13
Figure 13
Spatio-temporal behavior of public concern (curfew: 2 April 2020).
Figure 14
Figure 14
Spatio-temporal behavior of public concern (COVID-19 cases: 22 March 2020).
Figure 15
Figure 15
Spatio-temporal behavior of public concern (COVID-19 cases: 30 March 2020).
Figure 16
Figure 16
Execution time vs. number of cores for varying number of LDA iterations (no limit on the number of features).
Figure 17
Figure 17
Execution time vs. number of cores for various numbers of LDA iterations (a limited number of features—10,000 keywords).

Similar articles

Cited by

References

    1. Johns Hopkins University . Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) Johns Hopkins University; Baltimore, MD, USA: 2020.
    1. Agarwal S., Mittal N., Sureka A. Potholes and Bad Road Conditions- Mining Twitter to Extract Information on Killer Roads; Proceedings of the ACM India Joint International Conference on Data Science and Management of Data; Dona Paula, India. 11–13 January 2018.
    1. Klaithin S., Haruechaiyasak C. Traffic Information Extraction and Classification from Thai Twitter; Proceedings of the 13th International Joint Conference on Computer Science and Software Engineering (JCSSE); Khon Kaen, Thailand. 13–15 July 2016; pp. 1–6. - DOI
    1. D’Andrea E., Ducange P., Lazzerini B., Marcelloni F. Real-Time Detection of Traffic from Twitter Stream Analysis. IEEE Trans. Intell. Transp. Syst. 2015;16:2269–2283. doi: 10.1109/TITS.2015.2404431. - DOI
    1. Kurniawan D.A., Wibirama S., Setiawan N.A. Real-time Traffic Classification with Twitter Data Mining; Proceedings of the 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE); Yogyakarta, Indonesia. 5–6 October 2016; - DOI

Publication types