Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul;7(7):1069-1083.
doi: 10.1038/s41562-023-01591-z. Epub 2023 Apr 20.

Nationwide health, socio-economic and genetic predictors of COVID-19 vaccination status in Finland

Collaborators, Affiliations

Nationwide health, socio-economic and genetic predictors of COVID-19 vaccination status in Finland

Tuomo Hartonen et al. Nat Hum Behav. 2023 Jul.

Abstract

Understanding factors associated with COVID-19 vaccination can highlight issues in public health systems. Using machine learning, we considered the effects of 2,890 health, socio-economic and demographic factors in the entire Finnish population aged 30-80 and genome-wide information from 273,765 individuals. The strongest predictors of vaccination status were labour income and medication purchase history. Mental health conditions and having unvaccinated first-degree relatives were associated with reduced vaccination. A prediction model combining all predictors achieved good discrimination (area under the receiver operating characteristic curve, 0.801; 95% confidence interval, 0.799-0.803). The 1% of individuals with the highest predicted risk of not vaccinating had an observed vaccination rate of 18.8%, compared with 90.3% in the study population. We identified eight genetic loci associated with vaccination uptake and derived a polygenic score, which was a weak predictor in an independent subset. Our results suggest that individuals at higher risk of suffering the worst consequences of COVID-19 are also less likely to vaccinate.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schematic outline of the study.
a, COVID-19 vaccination uptake (at least one dose) at the end of October 2021 was extracted from the Finnish Vaccination Register for each individual aged 30–80 years and living in Finland. A comprehensive collection of potential predictors was extracted (at the end of 2019, except for vaccination status of relatives, for which data up to the end of October 2021 were used) from nationwide registries, totalling 2,890 potential predictors across 12 manually defined predictor categories. The genetics of COVID-19 vaccination uptake was studied in a subsample of individuals of the total study population (FinnGen participants) and replicated in Estonia Biobank. Machine learning was then used to identify the predictors and predictor categories that best predict vaccination uptake in the test set. b, Total number of vaccinated (blue, at least one vaccination dose) and unvaccinated (purple) females and males in the study population at the end of October 2021. c, Cumulative fraction of different age groups in the study population (blue indicates 30- to 40-year-olds, orange indicates 41- to 50-year-olds, green indicates 51- to 60-year-olds, red indicates 61- to 70-year-olds and violet indicates 71- to 80-year-olds) who had received the first dose of a COVID-19 vaccine as a function of time during the follow-up period. Panel a created with BioRender.com.
Fig. 2
Fig. 2. Predictors of COVID-19 vaccination uptake.
a, AUC for XGBoost classifiers trained using predictors from different categories (each model also includes the baseline predictors age and sex). The error bars show 95% CIs computed using bootstrapping; the centres of the error bars correspond to the point estimates. The number of predictors in each category is indicated on the corresponding bar. The black dashed vertical line indicates the performance of the full XGBoost model using all predictors. The black dotted vertical line corresponds to an XGBoost model allowed to use only age and sex as predictors. Most of the predictor categories perform better than this baseline model, with income and medication purchases being the most predictive categories. b, AUC from Lasso classifiers trained separately for each of the individual predictors (the models also include the baseline predictors age and sex), grouped by the categories. Some of the highly predictive predictors have been highlighted (for a fully annotated list of AUCs of individual predictors, see Supplementary Table 2). c, Association between labour income in 2019 and COVID-19 vaccination uptake. The ORs are from a logistic regression model using income percentile bins as predictors and adjusting for age and sex. The 40–50% percentile bin was used as a reference category. The dots represent point estimates of ORs. The error bars indicate 95% CIs for the ORs computed using bootstrapping. d, Associations between previous disease diagnoses and COVID-19 vaccination status. The ORs are from a logistic regression model using a binary disease indicator as the predictor and adjusting for age and sex. Some of the interesting predictors are highlighted. MBD, mental and behavioural disorders. Predictors with multiple hypothesis testing-adjusted P > 0.01 (Benjamini–Hochberg method) and prevalence among vaccinated <1,000 are not shown. P values are two-sided and were calculated by dividing the coefficient values by their standard errors and observing the probability mass corresponding to equal or more extreme values from both tails of the standard normal distribution (as in the R package glm). The dots represent the point estimates of ORs. The error bars indicate 95% CIs for the ORs computed using bootstrapping. For a fully annotated list of the ORs of individual predictors, see Supplementary Table 3.
Fig. 3
Fig. 3. A prediction model for COVID-19 vaccination uptake.
a, Fractions of unvaccinated individuals in the test set as a function of centile bins of predicted probabilities to not vaccinate from the full XGBoost model. The 99th centile bin comprises 6,385 individuals that have only an 18.8% (95% CI, 17.9–19.8%) chance of vaccinating. The error bars indicate 95% CIs computed using bootstrapping. The black dashed line indicates the average fraction of unvaccinated individuals in the study population. b, Mean absolute SHAP values computed for all individual predictors used in the full XGBoost model. A higher value indicates a higher average impact of the predictor. The top 20 most important predictors are shown for clarity. The error bars indicate 95% CIs computed using bootstrapping (some uncertainty estimates are very small). HMG CoA, β-hydroxy β-methylglutaryl coenzyme; A, Coxibs, cox-2 inhibitors.
Fig. 4
Fig. 4. Shared information across different predictor categories.
a, Drop in AUC (y axis) when removing a single category at a time from the full Lasso classifier (including all predictors). Removing all predictors from a category removes all information unique to the predictors of that category, meaning that the drop in AUC quantifies the loss in predictive power due to information unique to the removed category. The lower the AUC here, the higher is the amount of unique information contained in the category that is useful for predicting COVID-19 vaccination uptake. The black dashed line indicates the AUC of the full Lasso model using all predictor categories. The error bars and the error band correspond to 95% CIs computed using bootstrapping; the centres of the error bars correspond to the point estimates. b, Drop in AUC (y axis) when removing different combinations of predictor categories from the full Lasso model (the full model corresponds to ‘Number of included categories = 12’). All combinations of removed categories were tested by training separate Lasso classifiers on the data including only the specific combination of predictor categories, and the corresponding AUCs are shown as individual dots. The violin plots show the distribution of AUCs for each number of removed categories. Individual models discussed in the text are highlighted and named. The model with zero removed categories corresponds to a model trained using the baseline predictors age and sex only. All models include age and sex as predictors. Panel a shows a detailed view of ‘Number of included categories = 11’. c, Pairwise partial Pearson correlation, adjusting for age and sex, between predicted probabilities of COVID-19 vaccination uptake for each test set sample, obtained from each category separately (XGBoost classifiers; the AUCs for these models are shown in Fig. 2a and Supplementary Table 1). The colour indicates the strength of correlation, and the correlation coefficient is shown on each heat-map cell. Hierarchical clustering dendrograms of the partial correlation matrix of model predictions are shown beside the matrix and were used in ordering the rows and columns.
Fig. 5
Fig. 5. Genetic correlates of COVID-19 vaccination uptake.
a, Manhattan plot of COVID-19 vaccination uptake from a meta-analysis of FinnGen and the Estonian Biobank. Genetic variants must have been tested in both datasets and passed quality control in both (INFO ≥ 0.8 and MAF ≥ 0.1%), and significant variants must not have indicated significant heterogeneity (heterogeneity P < 0.0056; the P values were Bonferroni corrected for multiple testing with nine significant variants). The red horizontal line indicates genome-wide significance. b, Genetic correlations between COVID-19 vaccination uptake and selected health and behavioural phenotypes. The point estimates represent correlations, and the error bars reflect standard errors. Orange error bars and point estimates represent Bonferroni-significant genetic correlations (P < 0.002, Bonferroni corrected for multiple testing with 23 tests). The black dashed line indicates zero genetic correlation. For both panels, the P values are two-sided and were calculated by dividing the coefficient or correlation values by their standard errors and observing the probability mass corresponding to equal or more extreme values from both tails of the standard normal distribution (as in the R package glm). A positive correlation means a correlation with reduced COVID-19 vaccination uptake.
Extended Data Fig. 1
Extended Data Fig. 1. COVID-19 1st dose vaccination coverage in the study population in each Finnish municipality.
Residents of Askola (highlighted with red and annotated) were excluded from the study as the vaccination coverage in Askola (2,948 residents in the study population) seemed artificially low compared to all other municipalities and is likely due to misreporting.
Extended Data Fig. 2
Extended Data Fig. 2. The effects of downsampling of controls and use of balanced class weights to the XGBoost model predictions.
a) Downsampling controls does not negatively affect the machine learning model predictions. AUCs for models trained with all cases and five randomly sampled controls per each case (orange) and for models trained with the full training data without downsampling (black). Predictors used by the models are indicated on the x-axis. All AUCs correspond to XGBoost models, except for the Full model (indicated with blue colour), where the AUCs were computed for the Lasso model, as the full XGBoost model could not be trained without downsampling the controls due to memory issues. b) Class weighting has a negligible effect on the XGBoost model predictions. AUCs for XGBoost models trained with balanced class weighting (orange) versus with no class weights (blue). In both cases, five controls per each case were sampled randomly for the training data. Predictors used by the models are indicated on the x-axis. In both panels, the error bars indicate 95% confidence intervals computed using bootstrapping, and the centre of the error bars corresponds to the point estimate. All models include the baseline predictors age and sex.
Extended Data Fig. 3
Extended Data Fig. 3. Effect size distributions across the predictor categories.
Violin plots describing the distributions of adjusted odds ratios (OR) (adjusted for age and sex, see Methods) for not uptaking the COVID-19 vaccination separately for each of the predictor categories. See Supplementary Table 3 for a full list of ORs for the individual predictors. Inside the violins, the box shows the quartiles of the distribution, white dot is the median and whiskers correspond to 1.5 times the interquartile range.
Extended Data Fig. 4
Extended Data Fig. 4. The effect of unvaccinated relatives to risk of not vaccinating.
Adjusted (for age and sex, see Methods) odds ratios (OR) describing the risk of not uptaking the COVID-19 vaccination when either a) mother, b) father, or c) any of their siblings is unvaccinated (for the entire follow-up period of 1.1.2021-31.10.2021).
Extended Data Fig. 5
Extended Data Fig. 5. Sensitivity analysis removing all individuals with no data entries in the year 2019 from the study population. In total 129,089 individuals had no data entries in the year 2019 (see Methods for details). The dots are coloured by the predictor category. Error bars correspond to 95% confidence intervals computed using bootstrapping and dots correspond to point estimates.
a) Area under receiver-operator characteristics curve (AUC) using the full study population (x-axis) plotted against the AUC using the study population with individuals with no data in the year 2019 removed (y-axis) from Lasso classifier models trained separately for each individual predictor (including also age and sex as predictors in each model). Models were trained separately using training data with and without individuals with no data entries in the year 2019. AUCs were computed on a separate unseen test set. No significant changes in AUC were observed for any predictor. b) Odds ratios (OR) using the full study population (x-axis) plotted against the ORs using the study population with individuals with no data in the year 2019 removed (y-axis) from logistic regression models trained separately for each individual predictor, adjusting for age and sex. Significant drop in OR when removing individuals with no data in the year 2019 occur mostly for relatively rare mother tongues (some highlighted with labels).
Extended Data Fig. 6
Extended Data Fig. 6. Calibration of the prediction model of COVID-19 vaccination uptake.
Calibration curves for the full XGBoost (all predictors) model predicting COVID-19 vaccination status before (blue) and after (orange) recalibration (see Methods).
Extended Data Fig. 7
Extended Data Fig. 7. Genetic correlations with and without COVID-19 cases included in the phenotype definition.
The analysis was performed within the FinnGen study. Point estimates represent correlations with error bars reflecting standard errors. Black error bars and point estimates represent the vaccination phenotype which includes COVID-19 cases.

References

    1. Tregoning JS, et al. Progress of the COVID-19 vaccine effort: viruses, vaccines and variants versus efficacy, effectiveness and escape. Nat. Rev. Immunol. 2021;21:626–636. doi: 10.1038/s41577-021-00592-1. - DOI - PMC - PubMed
    1. Ritchie, H. et al. Coronavirus Pandemic (COVID-19) (Our World in Data); https://ourworldindata.org/coronavirus
    1. Zheng C, et al. Real-world effectiveness of COVID-19 vaccines: a literature review and meta-analysis. Int. J. Infect. Dis. 2022;114:252–260. doi: 10.1016/j.ijid.2021.11.009. - DOI - PMC - PubMed
    1. Tan ST, et al. Infectiousness of SARS-CoV-2 breakthrough infections and reinfections during the Omicron wave. Nat. Med. 2023;29:358–365. doi: 10.1038/s41591-022-02138-x. - DOI - PMC - PubMed
    1. Hammer CC, et al. High but slightly declining COVID-19 vaccine acceptance and reasons for vaccine acceptance, Finland April to December 2020. Epidemiol. Infect. 2021;149:E123. doi: 10.1017/S0950268821001114. - DOI - PMC - PubMed

Publication types

Substances