Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 8;52(1):44-57.
doi: 10.1093/ije/dyac221.

Exploring the impact of selection bias in observational studies of COVID-19: a simulation study

Affiliations

Exploring the impact of selection bias in observational studies of COVID-19: a simulation study

Louise A C Millard et al. Int J Epidemiol. .

Abstract

Background: Non-random selection of analytic subsamples could introduce selection bias in observational studies. We explored the potential presence and impact of selection in studies of SARS-CoV-2 infection and COVID-19 prognosis.

Methods: We tested the association of a broad range of characteristics with selection into COVID-19 analytic subsamples in the Avon Longitudinal Study of Parents and Children (ALSPAC) and UK Biobank (UKB). We then conducted empirical analyses and simulations to explore the potential presence, direction and magnitude of bias due to this selection (relative to our defined UK-based adult target populations) when estimating the association of body mass index (BMI) with SARS-CoV-2 infection and death-with-COVID-19.

Results: In both cohorts, a broad range of characteristics was related to selection, sometimes in opposite directions (e.g. more-educated people were more likely to have data on SARS-CoV-2 infection in ALSPAC, but less likely in UKB). Higher BMI was associated with higher odds of SARS-CoV-2 infection and death-with-COVID-19. We found non-negligible bias in many simulated scenarios.

Conclusions: Analyses using COVID-19 self-reported or national registry data may be biased due to selection. The magnitude and direction of this bias depend on the outcome definition, the true effect of the risk factor and the assumed selection mechanism; these are likely to differ between studies with different target populations. Bias due to sample selection is a key concern in COVID-19 research based on national registry data, especially as countries end free mass testing. The framework we have used can be applied by other researchers assessing the extent to which their results may be biased for their research question of interest.

Keywords: ALSPAC; COVID-19; SARS-CoV-2 infection; Selection bias; UK Biobank; misclassification bias.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Directed acyclic graphs depicting assumed causal models for empirical and simulation scenarios. (a) SARS-CoV-2 infection. Dashed lines indicate the causal effect we are estimating. Simulations based on Avon Longitudinal Study of Parents and Children (ALSPAC) and UK Biobank (UKB) data. Participants were assessed (and hence selected) if they reported whether they have had a SARS-CoV-2 infection in ALSPAC or had a SARS-CoV-2 polymerase chain reaction (PCR) test result in UKB. (b) Death-with-COVID-19. Dashed lines indicate the causal effect we are estimating. Simulations based on UKB data only. Participants were selected if they were assessed [as in (a)] and tested positive or if they died with COVID-19. An arrow from Node A to Node B in a directed acyclic graph (DAG) indicates that A is a direct cause of B (i.e. A affects B not only through another node in the DAG). DAGs do not describe ‘how’ this effect occurs, i.e. the specific model describing this relationship, including whether nodes interact in their effects. For example, in DAG (b), infection is a direct cause of death-with-COVID-19 as a person can only die with COVID-19 if they are infected. Thus, infection interacts with all other direct effects of death-with-COVID-19. For example, smoking directly affects the risk of dying with COVID-19 only among those with a SARS-CoV-2 infection (i.e. the effect of smoking on death-with-COVID-19 depends on SARS-CoV-2 infection status). BMI, body mass index; SEP, socio-economic position.
Figure 2
Figure 2
Forest plots of the association between the candidate predictors of selection and outcomes related to SARS-CoV-2 infection. ORs and their 95% CIs are shown for (a) categorical variables and (b) continuous variables. Estimates for continuous candidate predictors are per 1 SD for each predictor except for the Deprivation Index, which is given per 1 higher quantile. ALSPAC, Avon Longitudinal Study of Parents and Children; UKB, UK Biobank; OR, odds ratio; BMI, body mass index; SBP, systolic blood pressure; DBP, diastolic blood pressure; GCSE, General Certification of Secondary Education.
Figure 3
Figure 3
Forest plots of the association between BMI and COVID-19-related outcomes. In the ALSPAC cohort of young adults; SARS-CoV-2(+) vs SARS-CoV-2(−) N=1915, SARS-CoV-2(+) vs ‘everyone else’ N=2983. In UKB; SARS-CoV-2(+) vs SARS-CoV-2(−) N=4662, SARS-CoV-2(+) vs ‘everyone else’ N=409 487. Death-with-COVID-19 vs SARS-CoV-2(+) not resulting in death-with-COVID-19 N=1375, death-with-COVID-19 vs ‘everyone else’ N=409 487. Models were adjusted for age, sex, smoking, education and proxies of socio-economic position. ‘Everyone else’ control group includes those tested and SARS-CoV-2(−) and those not tested. BMI, body mass index; ALSPAC, Avon Longitudinal Study of Parents and Children; UKB, UK Biobank; OR, odds ratio.
Figure 4
Figure 4
A general framework for investigating the impact of selection bias. Prior to Step 1, researchers should have developed a directed acyclic graph (DAG) for their defined research question. After Step 3, researchers may choose to use approaches that account for missing data, e.g. inverse probability weighting or imputation. Researchers may also want to integrate our framework with the Treatment And Reporting of Missing Data in Observational Studies framework (specifically Step 2: Examine the data).

References

    1. Lu H, Cole SR, Howe CJ, Westreich D.. Toward a clearer definition of selection bias when estimating causal effects. Epidemiology 2022;33:699–706. - PMC - PubMed
    1. Smith LH. Selection mechanisms and their consequences: understanding and addressing selection bias. Curr Epidemiol Rep 2020;7:179–89.
    1. Cole SR, Platt RW, Schisterman EF. et al. Illustrating bias due to conditioning on a collider. Int J Epidemiol 2010;39:417–20. - PMC - PubMed
    1. Munafò MR, Tilling K, Taylor AE, Evans DM, Davey Smith G.. Collider scope: when selection bias can substantially influence observed associations. Int J Epidemiol 2018;47:226–35. - PMC - PubMed
    1. Griffith GJ, Morris TT, Tudball MJ. et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat Commun 2020;11:5749. - PMC - PubMed

Publication types