Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 13;115(11):2571-2577.
doi: 10.1073/pnas.1708282114.

Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data

Affiliations

Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data

Martijn J Schuemie et al. Proc Natl Acad Sci U S A. .

Abstract

Observational healthcare data, such as electronic health records and administrative claims, offer potential to estimate effects of medical products at scale. Observational studies have often been found to be nonreproducible, however, generating conflicting results even when using the same database to answer the same question. One source of discrepancies is error, both random caused by sampling variability and systematic (for example, because of confounding, selection bias, and measurement error). Only random error is typically quantified but converges to zero as databases become larger, whereas systematic error persists independent from sample size and therefore, increases in relative importance. Negative controls are exposure-outcome pairs, where one believes no causal effect exists; they can be used to detect multiple sources of systematic error, but interpreting their results is not always straightforward. Previously, we have shown that an empirical null distribution can be derived from a sample of negative controls and used to calibrate P values, accounting for both random and systematic error. Here, we extend this work to calibration of confidence intervals (CIs). CIs require positive controls, which we synthesize by modifying negative controls. We show that our CI calibration restores nominal characteristics, such as 95% coverage of the true effect size by the 95% CI. We furthermore show that CI calibration reduces disagreement in replications of two pairs of conflicting observational studies: one related to dabigatran, warfarin, and gastrointestinal bleeding and one related to selective serotonin reuptake inhibitors and upper gastrointestinal bleeding. We recommend CI calibration to improve reproducibility of observational studies.

Keywords: calibration; observational studies; systematic error.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: M.J.S. and P.B.R. are full-time employees and shareholders of Janssen Research & Development.

Figures

Fig. 1.
Fig. 1.
Uncalibrated estimates and corresponding SEs for the negative and positive controls in the four studies. The estimates are stratified by true effect size. The areas above the red dashed lines indicate where the CIs include the true effect size. Note that, because of limitations in sample size, not all negative controls could be used to synthesize positive controls.
Fig. 2.
Fig. 2.
The fraction of controls where the true hazard ratio is above, within, or below the CI for various widths of the CI. The dashed lines indicate the boundaries of a perfectly calibrated and centered estimator.
Fig. 3.
Fig. 3.
Calibrated estimates and corresponding SEs for the negative and positive controls in the four studies. The estimates are stratified by true effect size. The areas above the red dashed lines indicate where the CIs include the true effect size.
Fig. 4.
Fig. 4.
The fraction of controls where the true hazard ratio is above, within, or below the calibrated CI for various widths of the CI. The dashed lines indicate the boundaries of a perfectly calibrated and centered estimator. Fractions were computed using leave-one-out cross-validation.
Fig. 5.
Fig. 5.
Estimates from the original studies and our reproduction of the studies by Southworth et al. (12) and Graham et al. (13) both before and after calibration.
Fig. 6.
Fig. 6.
Estimates from the original studies and our reproduction of the studies by Tata et al. (14) both before and after calibration.

References

    1. Overhage JM, Ryan PB, Schuemie MJ, Stang PE. Desideratum for evidence based epidemiology. Drug Saf. 2013;1(36 Suppl):S5–S14. - PubMed
    1. Prasad V, Jena AB. Prespecified falsification end points: Can they validate true observational associations? JAMA. 2013;309:241–242. - PubMed
    1. Dusetzina SB, Brookhart MA, Maciejewski ML. Control outcomes and exposures for improving internal validity of nonrandomized studies. Health Serv Res. 2015;50:1432–1451. - PMC - PubMed
    1. Arnold BF, Ercumen A. Negative control outcomes: A tool to detect bias in randomized trials. JAMA. 2016;316:2597–2598. - PMC - PubMed
    1. Lipsitch M, Tchetgen Tchetgen E, Cohen T. Negative controls: A tool for detecting confounding and bias in observational studies. Epidemiology. 2010;21:383–388. - PMC - PubMed

Publication types

MeSH terms