Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021;51(3):1296-1325.
doi: 10.1007/s10489-020-01862-6. Epub 2020 Sep 21.

COVID-19 open source data sets: a comprehensive survey

Affiliations

COVID-19 open source data sets: a comprehensive survey

Junaid Shuja et al. Appl Intell (Dordr). 2021.

Abstract

In December 2019, a novel virus named COVID-19 emerged in the city of Wuhan, China. In early 2020, the COVID-19 virus spread in all continents of the world except Antarctica, causing widespread infections and deaths due to its contagious characteristics and no medically proven treatment. The COVID-19 pandemic has been termed as the most consequential global crisis since the World Wars. The first line of defense against the COVID-19 spread are the non-pharmaceutical measures like social distancing and personal hygiene. The great pandemic affecting billions of lives economically and socially has motivated the scientific community to come up with solutions based on computer-aided digital technologies for diagnosis, prevention, and estimation of COVID-19. Some of these efforts focus on statistical and Artificial Intelligence-based analysis of the available data concerning COVID-19. All of these scientific efforts necessitate that the data brought to service for the analysis should be open source to promote the extension, validation, and collaboration of the work in the fight against the global pandemic. Our survey is motivated by the open source efforts that can be mainly categorized as (a) COVID-19 diagnosis from CT scans, X-ray images, and cough sounds, (b) COVID-19 case reporting, transmission estimation, and prognosis from epidemiological, demographic, and mobility data, (c) COVID-19 emotional and sentiment analysis from social media, and (d) knowledge-based discovery and semantic analysis from the collection of scholarly articles covering COVID-19. We survey and compare research works in these directions that are accompanied by open source data and code. Future research directions for data-driven COVID-19 research are also debated. We hope that the article will provide the scientific community with an initiative to start open source extensible and transparent research in the collective fight against the COVID-19 pandemic.

Keywords: Artificial intelligence; COVID-19; Coronavirus; Data sets; Machine learning; Open source; Pandemic.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Taxonomy of COVID-19 open source data sets
Fig. 2
Fig. 2
A generic work-flow of AI/ML based COVID-19 diagnosis
Fig. 3
Fig. 3
A generic work-flow of Social media based ML and NLP applications [83]
Fig. 4
Fig. 4
A work-flow of speech based COVID-19 diagnosis

Similar articles

Cited by

References

    1. World Health Organization (2020) Coronavirus disease 2019 (covid-19): situation report 162
    1. Keeling MJ, Deirdre Hollingsworth T, Read JM (2020) The efficacy of contact tracing for the containment of the 2019 novel coronavirus (covid-19). MedRxiv - PMC - PubMed
    1. Boccaletti S, Ditto W, Mindlin G, Atangana A (2020) Modeling and forecasting of epidemic spreading: the case of covid-19 and beyond. Chaos, Solitons, and Fractals - PMC - PubMed
    1. Kucharski AJ, Russell TW, Diamond C, Liu Y, Edmunds J, Funk S, Eggo RM, Sun F, Jit M, Munday JD et al (2020) Early dynamics of transmission and control of covid-19: a mathematical modelling study The lancet infectious diseases - PMC - PubMed
    1. Lopez CE, Vasu M, Gallemore C (2020) Understanding the perception of covid-19 policies by mining a multilanguage twitter dataset. arXiv:2003.10359

LinkOut - more resources