Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 15;17(14):5115.
doi: 10.3390/ijerph17145115.

Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach

Affiliations

Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach

Igor Gadelha Pereira et al. Int J Environ Res Public Health. .

Abstract

The contribution of this paper is twofold. First, a new data driven approach for predicting the Covid-19 pandemic dynamics is introduced. The second contribution consists in reporting and discussing the results that were obtained with this approach for the Brazilian states, with predictions starting as of 4 May 2020. As a preliminary study, we first used an Long Short Term Memory for Data Training-SAE (LSTM-SAE) network model. Although this first approach led to somewhat disappointing results, it served as a good baseline for testing other ANN types. Subsequently, in order to identify relevant countries and regions to be used for training ANN models, we conduct a clustering of the world's regions where the pandemic is at an advanced stage. This clustering is based on manually engineered features representing a country's response to the early spread of the pandemic, and the different clusters obtained are used to select the relevant countries for training the models. The final models retained are Modified Auto-Encoder networks, that are trained on these clusters and learn to predict future data for Brazilian states. These predictions are used to estimate important statistics about the disease, such as peaks and number of confirmed cases. Finally, curve fitting is carried out to find the distribution that best fits the outputs of the MAE, and to refine the estimates of the peaks of the pandemic. Predicted numbers reach a total of more than one million infected Brazilians, distributed among the different states, with São Paulo leading with about 150 thousand confirmed cases predicted. The results indicate that the pandemic is still growing in Brazil, with most states peaks of infection estimated in the second half of May 2020. The estimated end of the pandemics (97% of cases reaching an outcome) spread between June and the end of August 2020, depending on the states.

Keywords: Covid-19 pandemic; data-driven; modified auto-encoder; time series prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Nonetheless, the information presented in this paper are estimated predictions and not entirely reliable, as this is an experimental research, besides the computations are based on current parameters that can change if the behavior of the population changes. These are the results of a scientific study and is still expected to be improved as new data are produced, daily. The results data are exclusively for scientific communication and are available at http://www.natalnet.br/covid.

Figures

Figure 1
Figure 1
LSTM, DLSTM, and LSTM-SAE Blocks.
Figure 2
Figure 2
Comparison results for LSTM, DLSTM, and LSTM-SAE on Covid-19 cumulative (a) and daily (b) number of cases, data from Hubei, province of China.
Figure 3
Figure 3
Number of deaths per million inhabitants in the different Brazilian states on 1 May 2020.
Figure 4
Figure 4
Values of the three features used for characterizing the early response to covid-19 for the Brazilian states.
Figure 5
Figure 5
Modified Auto-Encoder architecture.
Figure 6
Figure 6
Projections for Rio Grande do Norte state (at the northeast of Brazil) [34]. Figure printed out from the web application running at http://astro.dfte.ufrn.br/html/Cliente/COVID19.php. Acessed on 4 May.
Figure 7
Figure 7
Projections for Brazil with adapted SEIR model [34], extracted from http://astro.dfte.ufrn.br/html/Cliente/COVID19.php. Acessed on 4 May.
Figure 8
Figure 8
Predictions and forecasting to Italy on Covid-19 cumulative (a) and daily (b).
Figure 9
Figure 9
Predictions and forecasting to Brazil on Covid-19 cumulative (a) and daily (b).
Figure 10
Figure 10
Predictions and forecasting to RN on Covid-19 cumulative (a) and daily (b).
Figure 11
Figure 11
2D UMAP embedding of the different countries and states studied. The colors represents different clusters generated using Affinity Propagation.
Figure 12
Figure 12
Clusters assignment of the different Brazilian states and world countries.
Figure 12
Figure 12
Clusters assignment of the different Brazilian states and world countries.
Figure 13
Figure 13
Violin plots representing the values taken by the different features for each groups obtained after UMAP + Affinity Propagation clustering.
Figure 14
Figure 14
Daily and Cumulative cases for Sergipe State from the Cluster 0.
Figure 15
Figure 15
Daily and Cumulative cases for São Paulo State from the Cluster 1.
Figure 16
Figure 16
Daily and Cumulative cases for Rio Grande do Norte State from the Cluster 2.
Figure 17
Figure 17
Daily and Cumulative cases for Santa Catarina State from the Cluster 3.
Figure 18
Figure 18
Curve fitting for Rio de Janeiro state (logNormal model was the best fit) with peak indicated on 31 May 2020.
Figure 19
Figure 19
Curve fitting for São Paulo state (logistic model was the best fit) with peak indicated on 26 May 2020.
Figure 20
Figure 20
Curve fitting for Rio Grande do Norte state (Burr model was the best fit) with peak indicated on 21 May 2020.

References

    1. Byass P. Eco-epidemiological assessment of the COVID-19 epidemic in China, January-February 2020. Glob. Health Action. 2020;13:1760490. doi: 10.1080/16549716.2020.1760490. - DOI - PMC - PubMed
    1. Hamzah F., Binti A., Lau C., Nazri H., Ligot D.V., Lee G., Tan C.L. CoronaTracker: Worldwide COVID-19 Outbreak Data Analysis and Prediction. Bull. World Health Organ. 2020;1:32. doi: 10.2471/BLT.20.255695. - DOI
    1. Fanelli D., Piazza F. Analysis and forecast of COVID-19 spreading in China, Italy and France. Chaos Solitons Fractals. 2020;134:109761. doi: 10.1016/j.chaos.2020.109761. - DOI - PMC - PubMed
    1. Webb G.F., Magal P., Liu Z., Seydi O. A model to predict COVID-19 epidemics with applications to South Korea, Italy, and Spain. medRxiv. 2020 doi: 10.1101/2020.04.07.20056945. - DOI
    1. Grant A. Dynamics of COVID-19 epidemics: SEIR models underestimate peak infection rates and overestimate epidemic duration. medRxiv. 2020 doi: 10.1101/2020.04.02.20050674. - DOI

Publication types