Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 29;20(11):3089.
doi: 10.3390/s20113089.

Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death

Affiliations

Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death

Ayan Chatterjee et al. Sensors (Basel). .

Abstract

"Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)", the novel coronavirus, is responsible for the ongoing worldwide pandemic. "World Health Organization (WHO)" assigned an "International Classification of Diseases (ICD)" code-"COVID-19"-as the name of the new disease. Coronaviruses are generally transferred by people and many diverse species of animals, including birds and mammals such as cattle, camels, cats, and bats. Infrequently, the coronavirus can be transferred from animals to humans, and then propagate among people, such as with "Middle East Respiratory Syndrome (MERS-CoV)", "Severe Acute Respiratory Syndrome (SARS-CoV)", and now with this new virus, namely "SARS-CoV-2", or human coronavirus. Its rapid spreading has sent billions of people into lockdown as health services struggle to cope up. The COVID-19 outbreak comes along with an exponential growth of new infections, as well as a growing death count. A major goal to limit the further exponential spreading is to slow down the transmission rate, which is denoted by a "spread factor (f)", and we proposed an algorithm in this study for analyzing the same. This paper addresses the potential of data science to assess the risk factors correlated with COVID-19, after analyzing existing datasets available in "ourworldindata.org (Oxford University database)", and newly simulated datasets, following the analysis of different univariate "Long Short Term Memory (LSTM)" models for forecasting new cases and resulting deaths. The result shows that vanilla, stacked, and bidirectional LSTM models outperformed multilayer LSTM models. Besides, we discuss the findings related to the statistical analysis on simulated datasets. For correlation analysis, we included features, such as external temperature, rainfall, sunshine, population, infected cases, death, country, population, area, and population density of the past three months - January, February, and March in 2020. For univariate timeseries forecasting using LSTM, we used datasets from 1 January 2020, to 22 April 2020.

Keywords: COVID-19; ICD; LSTM; RNN; algorithm; artificial intelligence; community disease; correlation; deep learning; hypothesis test; keras; machine learning; measurable sensor data; population; public health; python; regression; spread factor; statistics; transmission rate.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
(a) Daily corona total new cases; (b) and daily corona total death.
Figure 2
Figure 2
Consolidated case(s) growth of the “world”.
Figure 3
Figure 3
Top 17 countries according to the total cases reported till 22 April 2020.
Figure 4
Figure 4
The death ratio of top 17 countries according to the total cases reported till 22 April 2020.
Figure 5
Figure 5
(a) A vanilla LSTM cell; (b) Equations of a vanilla LSTM cell.
Figure 6
Figure 6
Correlation heatmap of simulated data (“simulated_data_1”) to check feature correlation.
Figure 7
Figure 7
Exponential regression plot to show death increases with number of cases.
Figure 8
Figure 8
Flattening the distribution graphs of active cases over days by reducing human coronavirus spreading with different “f” values, such as (a) f = 0.25; (b) f = 0.50; (c) f = 0.75; (d) f = 1.00; (e) f = 2.00; (f) f = 3.00; (g) f = 4.00; and (h) f = 5.00.
Figure 8
Figure 8
Flattening the distribution graphs of active cases over days by reducing human coronavirus spreading with different “f” values, such as (a) f = 0.25; (b) f = 0.50; (c) f = 0.75; (d) f = 1.00; (e) f = 2.00; (f) f = 3.00; (g) f = 4.00; and (h) f = 5.00.
Figure 9
Figure 9
Trend analysis of total reported cases in four Asian countries.
Figure 10
Figure 10
Comparing the calibration of the LSTM models to forecast total cases of the “World”.
Figure 11
Figure 11
Comparing the calibration of the LSTM models to forecast total deaths of the “World”.

References

    1. Wu F., Zhao S., Yu B., Chen Y.-M., Wang W., Song Z.-G., Hu Y., Tao Z.-W., Tian J.-H., Pei Y.-Y., et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579:265–269. doi: 10.1038/s41586-020-2008-3. - DOI - PMC - PubMed
    1. Wang J., Pan B., Ge L. Reply to Wu et al.: Commentary on insomnia and risk of mortality. Sleep Med. Rev. 2020;50:101256. doi: 10.1016/j.smrv.2019.101256. - DOI - PubMed
    1. Andersen K.G., Rambaut A., Lipkin W.I., Holmes E.C., Garry R.F. The proximal origin of SARS-CoV-2. Nat. Med. 2020;26:450–452. doi: 10.1038/s41591-020-0820-9. - DOI - PMC - PubMed
    1. World Health Organization . COVID-19 Page. WHO; Geneva, Switzerland: 2019. [(accessed on 26 May 2020)]. Available online: https://www.who.int/emergencies/diseases/novel-coronavirus-2019.
    1. Our World in Data (University of Oxford) [(accessed on 15 May 2020)]; Available online: https://ourworldindata.org/coronavirus-source-data.

MeSH terms

LinkOut - more resources