Comparative Study

. 2024 Nov 13;24(1):278.

doi: 10.1186/s12874-024-02382-4.

Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study

Emily Kawabata^#^{1

2}, Daniel Major-Smith^#^{1

2}, Gemma L Clayton^#^{1

2}, Chin Yang Shapland^{1

2}, Tim P Morris³, Alice R Carter^{1

2}, Alba Fernández-Sanlés⁴, Maria Carolina Borges^{1

2}, Kate Tilling^{1

2}, Gareth J Griffith^{1

2}, Louise A C Millard^{1

2}, George Davey Smith^{1

2}, Deborah A Lawlor^{1

2}, Rachael A Hughes^{5

6}

Affiliations

¹ MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.
² Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.
³ MRC Clinical Trials Unit at UCL, London, UK.
⁴ MRC Unit for Lifelong Health and Ageing at University College London, London, UK.
⁵ MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK. rachael.hughes@bristol.ac.uk.
⁶ Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK. rachael.hughes@bristol.ac.uk.

^# Contributed equally.

PMID: 39538117
PMCID: PMC11558901
DOI: 10.1186/s12874-024-02382-4

Comparative Study

Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study

Emily Kawabata et al. BMC Med Res Methodol. 2024.

. 2024 Nov 13;24(1):278.

doi: 10.1186/s12874-024-02382-4.

Authors

Affiliations

¹ MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.
² Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.
³ MRC Clinical Trials Unit at UCL, London, UK.
⁴ MRC Unit for Lifelong Health and Ageing at University College London, London, UK.
⁵ MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK. rachael.hughes@bristol.ac.uk.
⁶ Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK. rachael.hughes@bristol.ac.uk.

^# Contributed equally.

PMID: 39538117
PMCID: PMC11558901
DOI: 10.1186/s12874-024-02382-4

Abstract

Background: Bias from data missing not at random (MNAR) is a persistent concern in health-related research. A bias analysis quantitatively assesses how conclusions change under different assumptions about missingness using bias parameters that govern the magnitude and direction of the bias. Probabilistic bias analysis specifies a prior distribution for these parameters, explicitly incorporating available information and uncertainty about their true values. A Bayesian bias analysis combines the prior distribution with the data's likelihood function whilst a Monte Carlo bias analysis samples the bias parameters directly from the prior distribution. No study has compared a Monte Carlo bias analysis to a Bayesian bias analysis in the context of MNAR missingness.

Methods: We illustrate an accessible probabilistic bias analysis using the Monte Carlo bias analysis approach and a well-known imputation method. We designed a simulation study based on a motivating example from the UK Biobank study, where a large proportion of the outcome was missing and missingness was suspected to be MNAR. We compared the performance of our Monte Carlo bias analysis to a principled Bayesian bias analysis, complete case analysis (CCA) and multiple imputation (MI) assuming missing at random.

Results: As expected, given the simulation study design, CCA and MI estimates were substantially biased, with 95% confidence interval coverages of 7-48%. Including auxiliary variables (i.e., variables not included in the substantive analysis that are predictive of missingness and the missing data) in MI's imputation model amplified the bias due to assuming missing at random. With reasonably accurate and precise information about the bias parameter, the Monte Carlo bias analysis performed as well as the Bayesian bias analysis. However, when very limited information was provided about the bias parameter, only the Bayesian bias analysis was able to eliminate most of the bias due to MNAR whilst the Monte Carlo bias analysis performed no better than the CCA and MI.

Conclusion: The Monte Carlo bias analysis we describe is easy to implement in standard software and, in the setting we explored, is a viable alternative to a Bayesian bias analysis. We caution careful consideration of choice of auxiliary variables when applying imputation where data may be MNAR.

Keywords: Bayesian bias analysis; Inverse probability weighting; Missing not at random; Monte Carlo bias analysis; Multiple imputation; Probabilistic bias analysis; Sensitivity analysis; UK Biobank.

PubMed Disclaimer

Conflict of interest statement

Declarations Ethics approval and consent to participate For the simulation study, data were completely simulated, which did not require approval from an ethics committee or consent from participants. UKB received ethical approval from the UK National Health Service’s National Research Ethics Service (ref. 11/NW/0382). All participants provided written and informed consent for data collection, analysis, and record linkage. This research was conducted under UKB application number 16729. Consent for publication Not applicable. Competing interests TPM has received consultancy fees from: Bayer Healthcare Pharmaceuticals, Alliance Pharmaceuticals, Gilead Sciences, and Kite Pharmaceuticals. Since January 2023, ARC has been an employee of Novo Nordisk Research Centre Oxford, which is not related to the current work and had no involvement in the decision to publish. The remaining authors declare that they have no competing interests.

Figures

**Fig. 1**
Missingness directed acyclic graphs (m-DAGs) of the scenario investigated by the simulation study when the exposure effect, $β_{X}$ , is (a) not-null and (b) null. Black edges depict the relationships in the fully observed data, and the blue and red edges depict the missingness mechanisms of the outcome and baseline variables (exposure, confounders, and auxiliary variables), respectively

**Fig. 2**
Missingness directed acyclic graph for the UK Biobank example. Black edges depict the assumed relationships in the fully observed data between the outcome (SARS-CoV-2 infection), exposure (body mass index (BMI)), confounders (age, sex, degree, and smoker), and auxiliary variables (asthma, diabetes, and hypertension). Tested, M^BMI, and M^{degree,smoker} denote missingness indicators for the outcome, exposure, and confounders, respectively. Blue and red edges depict the missingness mechanisms of the outcome and covariates (exposure and confounders), respectively. Note, we have not included all edges between the variables

**Fig. 3**
Bias and 95% confidence interval coverage of exposure effect, $β_{X}$ , according to the not null ( $β_{X} = ln (3)$ ) and null ( $β_{X} = 0$ ) scenarios for data generated using SM data generating model. Error bars denote 95% Monte Carlo intervals, and the vertical dashed line denotes zero bias (top) and nominal coverage (bottom). Results for Bayesian SM were based on 926–928 simulated datasets; the remaining methods were based on 1,000 simulated datasets

**Fig. 4**
Forest plot of the results for exposure odds ratio, $e x p \{β_{X}\}$ , estimated by complete case analysis (CCA), multiple imputation assuming missing at random (MI), population-based comparison group approach (Missing not infected), and the probabilistic bias analyses, Monte Carlo NARFCS and Bayesian SM. Dashed line denotes the null effect

See this image and copyright information in PMC

References

1. Rubin D. Inference and missing data. Biometrika. 1976;63:581–92.
1. Li Y, Miao W, Shpitser I, Tchetgen Tchetgen EJ. A self-censoring model for multivariate nonignorable nonmonotone missing data. Biometrics. 2023;: 1–12. - PubMed
1. Giusti C, Little RJ. An analysis of nonignorable nonresponse to income in a survey with a rotating panel design. J Official Statistics. 2011;27(2):211–29.
1. White IR, Carpenter J, Horton NJ. A mean score method for sensitivity analysis to depatures from the missing at random assumption in randomised trials. Stat Sin. 2018;28(4):1985–2003. - PMC - PubMed
1. Tompsett DM, Leacy F, Moreno-Betancu M, Heron J, White IR. On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice. Stat Med. 2018;37:2338–53. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study

Affiliations

Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous