Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 27:10:1773-1786.
doi: 10.2147/CLEP.S178163. eCollection 2018.

Combining distributed regression and propensity scores: a doubly privacy-protecting analytic method for multicenter research

Affiliations

Combining distributed regression and propensity scores: a doubly privacy-protecting analytic method for multicenter research

Sengwee Toh et al. Clin Epidemiol. .

Abstract

Purpose: Sharing of detailed individual-level data continues to pose challenges in multi-center studies. This issue can be addressed in part by using analytic methods that require only summary-level information to perform the desired multivariable-adjusted analysis. We examined the feasibility and empirical validity of 1) conducting multivariable-adjusted distributed linear regression and 2) combining distributed linear regression with propensity scores, in a large distributed data network.

Patients and methods: We compared percent total weight loss 1-year postsurgery between Roux-en-Y gastric bypass and sleeve gastrectomy procedure among 43,110 patients from 36 health systems in the National Patient-Centered Clinical Research Network. We adjusted for baseline demographic and clinical variables as individual covariates, deciles of propensity scores, or both, in three separate outcome regression models. We used distributed linear regression, a method that requires only summary-level information (specifically, sums of squares and cross products matrix) from sites, to fit the three ordinary least squares linear regression models. A comparison set of analyses that used pooled deidentified individual-level data from sites served as the reference.

Results: Distributed linear regression produced results identical to those from the corresponding pooled individual-level data analysis for all variables in all three models. The maximum numerical difference in the parameter estimate or standard error for all the variables was 3×10-11 across three models.

Conclusion: Distributed linear regression analysis is a feasible and valid analytic method in multicenter studies for one-time continuous outcomes. Combining distributed regression with propensity scores via modeling offers more privacy protection and analytic flexibility.

Keywords: distributed data networks; distributed regression; privacy-protecting methods; propensity score.

PubMed Disclaimer

Conflict of interest statement

Disclosure Dr Arterburn reports NIH funding outside the submitted work. Mr Moyneur reports StatLog was paid consulting fees to conduct study programming. All authors received funding from PCORI to support the submitted work. The authors report no other conflicts of interest in this work.

Figures

Figure 1
Figure 1
Computation process of a typical regression analysis. Note: Numbers are hypothetical and for demonstrative purposes only. Abbreviations: SSCP, sums of squares and cross products; SE, standard error.
Figure 2
Figure 2
Distributed regression in a multicenter study. Note: Numbers are hypothetical and for demonstrative purposes only. Abbreviation: SSCP, sums of squares and cross products.
Figure 3
Figure 3
Workflow to perform pooled individual-level data analysis and distributed regression analysis.

References

    1. Behrman RE, Benner JS, Brown JS, McClellan M, Woodcock J, Platt R. Developing the Sentinel System--a national resource for evidence development. N Engl J Med. 2011;364(6):498–499. - PubMed
    1. Collins FS, Hudson KL, Briggs JP, Lauer MS. PCORnet: turning a dream into reality. J Am Med Inform Assoc. 2014;21(4):576–577. - PMC - PubMed
    1. Richesson RL, Hammond WE, Nahm M, et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J Am Med Inform Assoc. 2013;20(e2):e226–e231. - PMC - PubMed
    1. Sankar PL, Parker LS. The Precision Medicine Initiative’s All of Us Research Program: an agenda for research on its ethical, legal, and social issues. Genet Med. 2017;19(7):743–750. - PubMed
    1. Suissa S, Henry D, Caetano P, et al. Canadian Network for Observational Drug Effect Studies (CNODES) CNODES: the Canadian Network for Observational Drug Effect Studies. Open Med. 2012;6(4):e134–e140. - PMC - PubMed