Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 15;185(6):501-510.
doi: 10.1093/aje/kww155.

Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations

Affiliations

Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations

Justin Bohn et al. Am J Epidemiol. .

Abstract

Distributed networks of health-care data sources are increasingly being utilized to conduct pharmacoepidemiologic database studies. Such networks may contain data that are not physically pooled but instead are distributed horizontally (separate patients within each data source) or vertically (separate measures within each data source) in order to preserve patient privacy. While multivariable methods for the analysis of horizontally distributed data are frequently employed, few practical approaches have been put forth to deal with vertically distributed health-care databases. In this paper, we propose 2 propensity score-based approaches to vertically distributed data analysis and test their performance using 5 example studies. We found that these approaches produced point estimates close to what could be achieved without partitioning. We further found a performance benefit (i.e., lower mean squared error) for sequentially passing a propensity score through each data domain (called the "sequential approach") as compared with fitting separate domain-specific propensity scores (called the "parallel approach"). These results were validated in a small simulation study. This proof-of-concept study suggests a new multivariable analysis approach to vertically distributed health-care databases that is practical, preserves patient privacy, and warrants further investigation for use in clinical research applications that rely on health-care databases.

Keywords: database linkage; database studies; databases; epidemiologic methods; pharmacoepidemiology; propensity scores.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Structure of vertically and horizontally partitioned health-care databases. In this example, the analysis of interest concerns the effect of an exposure A on an outcome Y, wherein adjustment is needed for confounders X1–X6. In a horizontally partitioned system, different patient subsets are contributed by different sources (here, centers 1 and 2), while in a vertically partitioned system different patient covariates are contributed by different sources (here, medical insurance claims and a genomic database). ID, identification.
Figure 2.
Figure 2.
Schematic representation of the parallel and sequential approaches to analysis of vertically distributed data. The analytical goal is to estimate the effect of an exposure A on an outcome Y, wherein adjustment is needed for many covariates (X1–X8), on which data are available from 4 separate sources (domains) and cannot be pooled in a single analytical database. In the parallel approach (top row), separate propensity scores (PSs), PS1–PS4, are estimated within each domain, and the final analysis utilizes a function of the 4 domain-specific PSs—for example, in the model Y = A + f (PS1 + PS2 + PS3 + PS4). In the sequential approach (bottom row), a PS (PS1) is estimated in the first domain and then passed to the second domain. In the second domain, a PS is estimated on the basis of covariates in that domain and the PS from the first domain (PS2). This process is repeated iteratively across all domains until a single final PS (PS4) is produced, which can be used in the final analysis—for example, in the model Y = A + f (PS4). ID, identification.
Figure 3.
Figure 3.
Variations of the sequential approach to analysis of vertically distributed data. The performance of the sequential approach to propensity score (PS) estimation when data are vertically distributed is demonstrated here, showing the influence of domain ordering (vertical axis) and inclusion of a single continuous term for the final PS in the outcome model (dark gray circles) versus decile-indicator (light gray circles) treatment of the final PS in the outcome model. The 4 domains are outpatient (“Out”), inpatient (“In”), demographic factors (“Demo”), and prescriptions (“Drugs”), giving 24 possible orderings. The horizontal axis shows the difference between the log hazard ratio or log odds ratio (both abbreviated as risk ratio (RR)) and its reference estimate. Results are given separately for each of the 5 example studies: Schneeweiss et al., 2009 (18) (analyses of cyclooxygenase 2 inhibitors (A) and statins (B)); Schneeweiss et al., 2010 (19) (C); Patorno et al., 2010 (20) (D); and Patorno et al., 2014 (21) (E).
Figure 4.
Figure 4.
Performance of variations of the parallel and sequential propensity score approaches to analysis of vertically distributed data in simulation. The plotted treatment effect estimates are presented on the log odds ratio (OR) scale and have been averaged across the 2,000 simulations. All simulations were carried out under a true null treatment effect (log OR equal to 0). Error bars indicate the 2.5th and 97.5th percentiles (empirical 95% confidence intervals) of the treatment effect distributions. The horizontal axis shows an index of the variations of the parallel and sequential PS approaches. Symbol shape indicates the type of estimate: diamond, crude/unadjusted; squares, fully adjusted for all covariates across all domains; circles, parallel approach; triangles, sequential approach. Details on these variants can be found in Web Table 1.

Similar articles

References

    1. Platt R, Carnahan RM, Brown JS, et al. . The US Food and Drug Administration's Mini-Sentinel program: status and direction. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):1–8. - PubMed
    1. Califf RM. The Patient-Centered Outcomes Research Network: a national infrastructure for comparative effectiveness research. NC Med J. 2014;75(3):204–210. - PubMed
    1. Oliveira JL, Lopes P, Nunes T, et al. . The EU-ADR Web Platform: delivering advanced pharmacovigilance tools. Pharmacoepidemiol Drug Saf. 2013;22(5):459–467. - PubMed
    1. Trifirò G, Coloma PM, Rijnbeek PR, et al. . Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how. J Intern Med. 2014;275(6):551–561. - PubMed
    1. Curtis LH, Weiner MG, Boudreau DM, et al. . Design considerations, architecture, and use of the Mini-Sentinel distributed data system. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):23–31. - PubMed

MeSH terms