Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2024 Nov 13;24(1):278.
doi: 10.1186/s12874-024-02382-4.

Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study

Affiliations
Comparative Study

Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study

Emily Kawabata et al. BMC Med Res Methodol. .

Abstract

Background: Bias from data missing not at random (MNAR) is a persistent concern in health-related research. A bias analysis quantitatively assesses how conclusions change under different assumptions about missingness using bias parameters that govern the magnitude and direction of the bias. Probabilistic bias analysis specifies a prior distribution for these parameters, explicitly incorporating available information and uncertainty about their true values. A Bayesian bias analysis combines the prior distribution with the data's likelihood function whilst a Monte Carlo bias analysis samples the bias parameters directly from the prior distribution. No study has compared a Monte Carlo bias analysis to a Bayesian bias analysis in the context of MNAR missingness.

Methods: We illustrate an accessible probabilistic bias analysis using the Monte Carlo bias analysis approach and a well-known imputation method. We designed a simulation study based on a motivating example from the UK Biobank study, where a large proportion of the outcome was missing and missingness was suspected to be MNAR. We compared the performance of our Monte Carlo bias analysis to a principled Bayesian bias analysis, complete case analysis (CCA) and multiple imputation (MI) assuming missing at random.

Results: As expected, given the simulation study design, CCA and MI estimates were substantially biased, with 95% confidence interval coverages of 7-48%. Including auxiliary variables (i.e., variables not included in the substantive analysis that are predictive of missingness and the missing data) in MI's imputation model amplified the bias due to assuming missing at random. With reasonably accurate and precise information about the bias parameter, the Monte Carlo bias analysis performed as well as the Bayesian bias analysis. However, when very limited information was provided about the bias parameter, only the Bayesian bias analysis was able to eliminate most of the bias due to MNAR whilst the Monte Carlo bias analysis performed no better than the CCA and MI.

Conclusion: The Monte Carlo bias analysis we describe is easy to implement in standard software and, in the setting we explored, is a viable alternative to a Bayesian bias analysis. We caution careful consideration of choice of auxiliary variables when applying imputation where data may be MNAR.

Keywords: Bayesian bias analysis; Inverse probability weighting; Missing not at random; Monte Carlo bias analysis; Multiple imputation; Probabilistic bias analysis; Sensitivity analysis; UK Biobank.

PubMed Disclaimer

Conflict of interest statement

Declarations Ethics approval and consent to participate For the simulation study, data were completely simulated, which did not require approval from an ethics committee or consent from participants. UKB received ethical approval from the UK National Health Service’s National Research Ethics Service (ref. 11/NW/0382). All participants provided written and informed consent for data collection, analysis, and record linkage. This research was conducted under UKB application number 16729. Consent for publication Not applicable. Competing interests TPM has received consultancy fees from: Bayer Healthcare Pharmaceuticals, Alliance Pharmaceuticals, Gilead Sciences, and Kite Pharmaceuticals. Since January 2023, ARC has been an employee of Novo Nordisk Research Centre Oxford, which is not related to the current work and had no involvement in the decision to publish. The remaining authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Missingness directed acyclic graphs (m-DAGs) of the scenario investigated by the simulation study when the exposure effect, βX, is (a) not-null and (b) null. Black edges depict the relationships in the fully observed data, and the blue and red edges depict the missingness mechanisms of the outcome and baseline variables (exposure, confounders, and auxiliary variables), respectively
Fig. 2
Fig. 2
Missingness directed acyclic graph for the UK Biobank example. Black edges depict the assumed relationships in the fully observed data between the outcome (SARS-CoV-2 infection), exposure (body mass index (BMI)), confounders (age, sex, degree, and smoker), and auxiliary variables (asthma, diabetes, and hypertension). Tested, MBMI, and Mdegree,smoker denote missingness indicators for the outcome, exposure, and confounders, respectively. Blue and red edges depict the missingness mechanisms of the outcome and covariates (exposure and confounders), respectively. Note, we have not included all edges between the variables
Fig. 3
Fig. 3
Bias and 95% confidence interval coverage of exposure effect, βX, according to the not null (βX=ln(3)) and null (βX=0) scenarios for data generated using SM data generating model. Error bars denote 95% Monte Carlo intervals, and the vertical dashed line denotes zero bias (top) and nominal coverage (bottom). Results for Bayesian SM were based on 926–928 simulated datasets; the remaining methods were based on 1,000 simulated datasets
Fig. 4
Fig. 4
Forest plot of the results for exposure odds ratio, expβX, estimated by complete case analysis (CCA), multiple imputation assuming missing at random (MI), population-based comparison group approach (Missing not infected), and the probabilistic bias analyses, Monte Carlo NARFCS and Bayesian SM. Dashed line denotes the null effect

References

    1. Rubin D. Inference and missing data. Biometrika. 1976;63:581–92.
    1. Li Y, Miao W, Shpitser I, Tchetgen Tchetgen EJ. A self-censoring model for multivariate nonignorable nonmonotone missing data. Biometrics. 2023;: 1–12. - PubMed
    1. Giusti C, Little RJ. An analysis of nonignorable nonresponse to income in a survey with a rotating panel design. J Official Statistics. 2011;27(2):211–29.
    1. White IR, Carpenter J, Horton NJ. A mean score method for sensitivity analysis to depatures from the missing at random assumption in randomised trials. Stat Sin. 2018;28(4):1985–2003. - PMC - PubMed
    1. Tompsett DM, Leacy F, Moreno-Betancu M, Heron J, White IR. On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice. Stat Med. 2018;37:2338–53. - PMC - PubMed

Publication types

LinkOut - more resources