Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach
- PMID: 32679861
- PMCID: PMC7400194
- DOI: 10.3390/ijerph17145115
Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach
Abstract
The contribution of this paper is twofold. First, a new data driven approach for predicting the Covid-19 pandemic dynamics is introduced. The second contribution consists in reporting and discussing the results that were obtained with this approach for the Brazilian states, with predictions starting as of 4 May 2020. As a preliminary study, we first used an Long Short Term Memory for Data Training-SAE (LSTM-SAE) network model. Although this first approach led to somewhat disappointing results, it served as a good baseline for testing other ANN types. Subsequently, in order to identify relevant countries and regions to be used for training ANN models, we conduct a clustering of the world's regions where the pandemic is at an advanced stage. This clustering is based on manually engineered features representing a country's response to the early spread of the pandemic, and the different clusters obtained are used to select the relevant countries for training the models. The final models retained are Modified Auto-Encoder networks, that are trained on these clusters and learn to predict future data for Brazilian states. These predictions are used to estimate important statistics about the disease, such as peaks and number of confirmed cases. Finally, curve fitting is carried out to find the distribution that best fits the outputs of the MAE, and to refine the estimates of the peaks of the pandemic. Predicted numbers reach a total of more than one million infected Brazilians, distributed among the different states, with São Paulo leading with about 150 thousand confirmed cases predicted. The results indicate that the pandemic is still growing in Brazil, with most states peaks of infection estimated in the second half of May 2020. The estimated end of the pandemics (97% of cases reaching an outcome) spread between June and the end of August 2020, depending on the states.
Keywords: Covid-19 pandemic; data-driven; modified auto-encoder; time series prediction.
Conflict of interest statement
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Nonetheless, the information presented in this paper are estimated predictions and not entirely reliable, as this is an experimental research, besides the computations are based on current parameters that can change if the behavior of the population changes. These are the results of a scientific study and is still expected to be improved as new data are produced, daily. The results data are exclusively for scientific communication and are available at
Figures
References
-
- Hamzah F., Binti A., Lau C., Nazri H., Ligot D.V., Lee G., Tan C.L. CoronaTracker: Worldwide COVID-19 Outbreak Data Analysis and Prediction. Bull. World Health Organ. 2020;1:32. doi: 10.2471/BLT.20.255695. - DOI
-
- Webb G.F., Magal P., Liu Z., Seydi O. A model to predict COVID-19 epidemics with applications to South Korea, Italy, and Spain. medRxiv. 2020 doi: 10.1101/2020.04.07.20056945. - DOI
-
- Grant A. Dynamics of COVID-19 epidemics: SEIR models underestimate peak infection rates and overestimate epidemic duration. medRxiv. 2020 doi: 10.1101/2020.04.02.20050674. - DOI
