Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar;78(1):100-114.
doi: 10.1111/biom.13417. Epub 2020 Dec 31.

Causal inference in high dimensions: A marriage between Bayesian modeling and good frequentist properties

Affiliations

Causal inference in high dimensions: A marriage between Bayesian modeling and good frequentist properties

Joseph Antonelli et al. Biometrics. 2022 Mar.

Abstract

We introduce a framework for estimating causal effects of binary and continuous treatments in high dimensions. We show how posterior distributions of treatment and outcome models can be used together with doubly robust estimators. We propose an approach to uncertainty quantification for the doubly robust estimator, which utilizes posterior distributions of model parameters and (1) results in good frequentist properties in small samples, (2) is based on a single run of a Markov chain Monte Carlo (MCMC) algorithm, and (3) improves over frequentist measures of uncertainty which rely on asymptotic properties. We consider a flexible framework for modeling the treatment and outcome processes within the Bayesian paradigm that reduces model dependence, accommodates nonlinearity, and achieves dimension reduction of the covariate space. We illustrate the ability of the proposed approach to flexibly estimate causal effects in high dimensions and appropriately quantify uncertainty. We show that our proposed variance estimation strategy is consistent when both models are correctly specified, and we see empirically that it performs well in finite samples and under model misspecification. Finally, we estimate the effect of continuous environmental exposures on cholesterol and triglyceride levels.

Keywords: Bayesian modeling; causal inference; doubly robust estimation; environmental exposures; high-dimensional data; model selection; variable selection.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Values of Δ(D, Ψ) for different combinations of resampled data sets and posterior samples
Figure 2.
Figure 2.
Results from simulations with binary treatments. The top panel shows results for the linear scenario, while the bottom panel shows results for the nonlinear scenario. The first column shows absolute bias, the second column shows the variance, the third column shows 95% interval coverages, while the fourth column is the ratio of estimated to Monte Carlo standard errors.
Figure 3.
Figure 3.
Simulation results for continuous treatments. The top left panel presents the mean squared error, the top right panel shows the 95% credible interval coverage, the bottom left panel shows the ratio of estimated to Monte Carlo standard errors, and the bottom right panel shows the estimates of the exposure-response curve across the 1000 simulations for the doubly robust estimator.
Figure 4.
Figure 4.
Ratio of our variance estimator’s average value to the estimator’s Monte Carlo variance. Our variance estimator is separated by the contribution stemming only from the data (naïve variance – dark grey) and the contribution stemming from parameter estimation (correction term – light grey). Values near 1 indicate that our variance estimation strategy accurately reflects the estimator’s true uncertainty. The horizontal axis represent various data generative mechanisms including the ones presented in Section 5, and in the Supporting Information. With the order shown, simulations represent the linear binary, and non-linear binary simulations of Section 5.1, the continuous treatment simulation of Section 5.2, an additional continuous treatment simulation from Supporting Information F, three additional binary treatment simulations looking at different data generating mechanisms and sparsity levels found in Supporting Information F, and four low-dimensional simulations with different types of model misspecification found in Supporting Information H.
Figure 5.
Figure 5.
The top panel presents the ratio of WAIC values to the minimum values for each of the three models considered. The top left panel shows the treatment model WAIC values, while the top right panel shows the WAIC for the outcome models. The bottom panel shows the percentage of covariates included in the chosen treatment and outcome model. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.
Figure 6.
Figure 6.
Estimated exposure response curves from the doubly robust estimator (black line) as well as the naïve curve (red line), which does not adjust for any covariates. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

References

    1. Antonelli J, Cefalu M, Palmer N, and Agniel D (2016). Doubly robust matching estimators for high dimensional confounding adjustment. Biometrics. - PMC - PubMed
    1. Antonelli J, Parmigiani G, and Dominici F (2019). High-dimensional confounding adjustment using continuous spike and slab priors. Bayesian analysis 14, 805. - PMC - PubMed
    1. Athey S, Imbens GW, and Wager S (2018). Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80, 597–623.
    1. Avagyan V and Vansteelandt S (2020). High-dimensional inference for the average treatment effect under model misspecification using penalised bias-reduced double-robust estimation. Biostatistics and Epidemiology, forthcoming .
    1. Bang H and Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973. - PubMed

Publication types