Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations

Justin Bohn¹, Wesley Eddings^{2

3}, Sebastian Schneeweiss^{2

3}

Affiliations

¹ Department of Education and Psychology, Free University Berlin, Germany.
² Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, MA, USA
³ Harvard Medical School, Boston, MA, USA.

PMID: 28399565
PMCID: PMC5391702
DOI: 10.1093/aje/kww155

Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations

Justin Bohn et al. Am J Epidemiol. 2017.

. 2017 Mar 15;185(6):501-510.

doi: 10.1093/aje/kww155.

Authors

Justin Bohn¹, Wesley Eddings^{2

3}, Sebastian Schneeweiss^{2

3}

Affiliations

¹ Department of Education and Psychology, Free University Berlin, Germany.
² Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, MA, USA
³ Harvard Medical School, Boston, MA, USA.

PMID: 28399565
PMCID: PMC5391702
DOI: 10.1093/aje/kww155

Abstract

Distributed networks of health-care data sources are increasingly being utilized to conduct pharmacoepidemiologic database studies. Such networks may contain data that are not physically pooled but instead are distributed horizontally (separate patients within each data source) or vertically (separate measures within each data source) in order to preserve patient privacy. While multivariable methods for the analysis of horizontally distributed data are frequently employed, few practical approaches have been put forth to deal with vertically distributed health-care databases. In this paper, we propose 2 propensity score-based approaches to vertically distributed data analysis and test their performance using 5 example studies. We found that these approaches produced point estimates close to what could be achieved without partitioning. We further found a performance benefit (i.e., lower mean squared error) for sequentially passing a propensity score through each data domain (called the "sequential approach") as compared with fitting separate domain-specific propensity scores (called the "parallel approach"). These results were validated in a small simulation study. This proof-of-concept study suggests a new multivariable analysis approach to vertically distributed health-care databases that is practical, preserves patient privacy, and warrants further investigation for use in clinical research applications that rely on health-care databases.

Keywords: database linkage; database studies; databases; epidemiologic methods; pharmacoepidemiology; propensity scores.

© The Author 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

PubMed Disclaimer

Figures

**Figure 1.**
Structure of vertically and horizontally partitioned health-care databases. In this example, the analysis of interest concerns the effect of an exposure A on an outcome Y, wherein adjustment is needed for confounders X1–X6. In a horizontally partitioned system, different patient subsets are contributed by different sources (here, centers 1 and 2), while in a vertically partitioned system different patient covariates are contributed by different sources (here, medical insurance claims and a genomic database). ID, identification.

**Figure 2.**
Schematic representation of the parallel and sequential approaches to analysis of vertically distributed data. The analytical goal is to estimate the effect of an exposure A on an outcome Y, wherein adjustment is needed for many covariates (X1–X8), on which data are available from 4 separate sources (domains) and cannot be pooled in a single analytical database. In the parallel approach (top row), separate propensity scores (PSs), PS1–PS4, are estimated within each domain, and the final analysis utilizes a function of the 4 domain-specific PSs—for example, in the model Y = A + f (PS1 + PS2 + PS3 + PS4). In the sequential approach (bottom row), a PS (PS1) is estimated in the first domain and then passed to the second domain. In the second domain, a PS is estimated on the basis of covariates in that domain *and* the PS from the first domain (PS2). This process is repeated iteratively across all domains until a single final PS (PS4) is produced, which can be used in the final analysis—for example, in the model Y = A + f (PS4). ID, identification.

**Figure 3.**
Variations of the sequential approach to analysis of vertically distributed data. The performance of the sequential approach to propensity score (PS) estimation when data are vertically distributed is demonstrated here, showing the influence of domain ordering (vertical axis) and inclusion of a single continuous term for the final PS in the outcome model (dark gray circles) versus decile-indicator (light gray circles) treatment of the final PS in the outcome model. The 4 domains are outpatient (“Out”), inpatient (“In”), demographic factors (“Demo”), and prescriptions (“Drugs”), giving 24 possible orderings. The horizontal axis shows the difference between the log hazard ratio or log odds ratio (both abbreviated as risk ratio (RR)) and its reference estimate. Results are given separately for each of the 5 example studies: Schneeweiss et al., 2009 (18) (analyses of cyclooxygenase 2 inhibitors (A) and statins (B)); Schneeweiss et al., 2010 (19) (C); Patorno et al., 2010 (20) (D); and Patorno et al., 2014 (21) (E).

**Figure 4.**
Performance of variations of the parallel and sequential propensity score approaches to analysis of vertically distributed data in simulation. The plotted treatment effect estimates are presented on the log odds ratio (OR) scale and have been averaged across the 2,000 simulations. All simulations were carried out under a true null treatment effect (log OR equal to 0). Error bars indicate the 2.5th and 97.5th percentiles (empirical 95% confidence intervals) of the treatment effect distributions. The horizontal axis shows an index of the variations of the parallel and sequential PS approaches. Symbol shape indicates the type of estimate: diamond, crude/unadjusted; squares, fully adjusted for all covariates across all domains; circles, parallel approach; triangles, sequential approach. Details on these variants can be found in Web Table 1.

See this image and copyright information in PMC

References

1. Platt R, Carnahan RM, Brown JS, et al. The US Food and Drug Administration's Mini-Sentinel program: status and direction. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):1–8. - PubMed
1. Califf RM. The Patient-Centered Outcomes Research Network: a national infrastructure for comparative effectiveness research. NC Med J. 2014;75(3):204–210. - PubMed
1. Oliveira JL, Lopes P, Nunes T, et al. The EU-ADR Web Platform: delivering advanced pharmacovigilance tools. Pharmacoepidemiol Drug Saf. 2013;22(5):459–467. - PubMed
1. Trifirò G, Coloma PM, Rijnbeek PR, et al. Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how. J Intern Med. 2014;275(6):551–561. - PubMed
1. Curtis LH, Weiner MG, Boudreau DM, et al. Design considerations, architecture, and use of the Mini-Sentinel distributed data system. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):23–31. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 LM010213/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations

Affiliations

Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources