Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 13;14(1):6142.
doi: 10.1038/s41598-024-56267-z.

Social and economic variables explain COVID-19 diffusion in European regions

Affiliations

Social and economic variables explain COVID-19 diffusion in European regions

Christian Cancedda et al. Sci Rep. .

Erratum in

Abstract

At the beginning of 2020, Italy was the country with the highest number of COVID-19 cases, not only in Europe, but also in the rest of the world, and Lombardy was the most heavily hit region of Italy. The objective of this research is to understand which variables have determined the prevalence of cases in Lombardy and in other highly-affected European regions. We consider the first and second waves of the COVID-19 pandemic, using a set of 22 variables related to economy, population, healthcare and education. Regions with a high prevalence of cases are extracted by means of binary classifiers, then the most relevant variables for the classification are determined, and the robustness of the analysis is assessed. Our results show that the most meaningful features to identify high-prevalence regions include high number of hours spent in work environments, high life expectancy, and low number of people leaving from education and neither employed nor educated or trained.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Analysis of missing values in the adopted dataset. (a) shows the distribution of the fraction of missing features for each region. (b) shows the distribution of the fraction of missing regions for each feature.
Figure 2
Figure 2
ROC curves for the different predictive models with respect to the first (left) and second (right) waves. Curves close to the (0, 1) corner achieve better performance. (a) shows the ROC curve for the first wave: the random forest generally achieves better performance. (b) shows the ROC curve for the second wave: in this case, all models obtain similar results. All results have been obtained using leave-one-out validation. The positive class is considered to be the high risk one. The blue dotted line represents random guess.
Figure 3
Figure 3
Coefficients learned by the linear SVM model for the first and second wave. Positive coefficients imply a positive weight of the respective feature toward the prediction of the “high risk” class, whereas negative coefficients characterize features more related to the “reduced risk” class. The magnitude of the coefficients is proportional to how impactful the respective feature is in the prediction. The coefficients are sorted by descending importance (in absolute value). (a) shows the coefficients learned by the linear SVM for the first wave, (b) shows the coefficients for the second wave.
Figure 4
Figure 4
Feature distribution ordered from top to bottom by increasing p-value for an F-test applied on the data of the first (left) and second (right) waves. A lower p-value represents a more significant difference in means between the distributions of values between the high and reduced risk classes. A larger divergence may be leveraged by the classification models to identify differences between the two classes. All features have been standardized. (a) shows the distributions for the first wave, (b) shows the distribution for the second wave.
Figure 5
Figure 5
Side-by-side comparison of the continuous density of reported COVID-19 cases normalized per 100,000 inhabitants (left) and of the resulting discretization in binary severity classes (right), separately for first (top) and second (bottom) waves. Data obtained form the COVID-19 cases. Borders have been defined using the NUTS2 GeoJSON definitions. Python 3.8, GeoPandas 0.8 and Matplotlib 3.7 have been used to produce the images.

References

    1. Cereda, D. et al. The early phase of the covid-19 outbreak in Lombardy, Otaly. Preprint at arXiv:2003.09320 (2020).
    1. Usuelli M. The Lombardy region of Italy launches the first investigative covid-19 commission. Lancet. 2020;396:e86–e87. doi: 10.1016/S0140-6736(20)32154-1. - DOI - PMC - PubMed
    1. McLafferty S. Placing pandemics: Geographical dimensions of vulnerability and spread. Eurasian Geogr. Econ. 2010;51:143–161. doi: 10.2747/1539-7216.51.2.143. - DOI
    1. De Angelis E, et al. Covid-19 incidence and mortality in Lombardy, Italy: An ecological study on the role of air pollution, meteorological factors, demographic and socioeconomic variables. Environ. Res. 2021;195:110777. doi: 10.1016/j.envres.2021.110777. - DOI - PMC - PubMed
    1. Bontempi E. First data analysis about possible covid-19 virus airborne diffusion due to air particulate matter (pm): The case of Lombardy (Italy) Environ. Res. 2020;186:109639. doi: 10.1016/j.envres.2020.109639. - DOI - PMC - PubMed

LinkOut - more resources