. 2017 Jul 1;18(3):553-568.

doi: 10.1093/biostatistics/kxx003.

Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research

Joseph Antonelli¹, Corwin Zigler¹, Francesca Dominici¹

Affiliations

PMID: 28334230
PMCID: PMC5862356
DOI: 10.1093/biostatistics/kxx003

Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research

Joseph Antonelli et al. Biostatistics. 2017.

. 2017 Jul 1;18(3):553-568.

doi: 10.1093/biostatistics/kxx003.

Authors

Joseph Antonelli¹, Corwin Zigler¹, Francesca Dominici¹

Affiliation

¹ Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115,USA.

PMID: 28334230
PMCID: PMC5862356
DOI: 10.1093/biostatistics/kxx003

Abstract

In comparative effectiveness research, we are often interested in the estimation of an average causal effect from large observational data (the main study). Often this data does not measure all the necessary confounders. In many occasions, an extensive set of additional covariates is measured for a smaller and non-representative population (the validation study). In this setting, standard approaches for missing data imputation might not be adequate due to the large number of missing covariates in the main data relative to the smaller sample size of the validation data. We propose a Bayesian approach to estimate the average causal effect in the main study that borrows information from the validation study to improve confounding adjustment. Our approach combines ideas of Bayesian model averaging, confounder selection, and missing data imputation into a single framework. It allows for different treatment effects in the main study and in the validation study, and propagates the uncertainty due to the missing data imputation and confounder selection when estimating the average causal effect (ACE) in the main study. We compare our method to several existing approaches via simulation. We apply our method to a study examining the effect of surgical resection on survival among 10 396 Medicare beneficiaries with a brain tumor when additional covariate information is available on 2220 patients in SEER-Medicare. We find that the estimated ACE decreases by 30% when incorporating additional information from SEER-Medicare.

Keywords: Bayesian adjustment for confounding; Bayesian data augmentation; Confounder selection; Missing data; Model averaging.

PubMed Disclaimer

Figures

**Fig. 1.**
Bias, MSE, and interval coverage of the various estimators across 1000 simulations. , .

formula image — **Fig. 1.**
Bias, MSE, and interval coverage of the various estimators across 1000 simulations. , .

**Fig. 2.**
Estimated for each of the 50 covariates that can potentially enter into the outcome model, for GBAC() and GBAC(1). The points in black correspond to GBAC(), while those in grey correspond to GBAC(1). Squares represent the true confounders (), while circles represent covariates that are noise. Points to the left of the dotted line (indices 1–5) are covariates that are fully observed, while those to the right (indices 6–50) are only observed in the validation study).

**Fig. 3.**
Estimates and 95% posterior credible intervals for the average causal effect of surgical resection on the probability of 30 day survival in the Medicare population.

See this image and copyright information in PMC

References

1. Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data.. Journal of the American Statistical Association, 88, 669–679.
1. Breslow, N. E, Lumley, T., Ballantyne, C. M., Chambless, L. E. and Kulich, M. (2009). Improved Horvitz–Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology.. Statistics in Biosciences, 1, 32–49. - PMC - PubMed
1. Carroll, R. J, Ruppert, D., Stefanski, L. A and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. Boca Raton, Florida: CRC Press.
1. Chaichana, K. L, Garzon-Muvdi, T., Parker, S., Weingart, J. D, Olivi, A., Bennett, R., Brem, H. and Quinones-Hinojosa, A. (2011). Supratentorial glioblastoma multiforme: the role of surgical resection versus biopsy among older patients.. Annals of Surgical Oncology, 18, 239–245. - PMC - PubMed
1. Chatterjee, N., Chen, Y. H., Maas, P. and Carroll, R. J. (2015). Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources.. Journal of the American Statistical Association, 111, 1–32. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 ES024332/ES/NIEHS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research

Affiliation

Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research

Authors

Affiliation

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous