Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 1;27(3):376-385.
doi: 10.1093/jamia/ocz199.

Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm

Affiliations

Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm

Rui Duan et al. J Am Med Inform Assoc. .

Abstract

Objectives: We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.

Materials and methods: ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard).

Results: Our simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model.

Conclusions: This study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency.

Keywords: distributed algorithm; electronic health record; learning health system; logistic regression.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic illustration of the proposed one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) methods. (a) ODAL1: The initial value β- is obtained by fitting logistic model at the local site and is transfer to the other sites. Then the intermediate term Lj(β-) is evaluated at each site j (j = 2, …, K) and transferred back to the local site. Combined with L1(β-) and L1(β), we obtain the first-order surrogate likelihood function L1(β) and the ODAL1 estimator is obtained by maximizing L1β. (b) ODAL2: The initialization is the same as ODAL1, and the intermediate terms Lj(β-) and 2Lj(β-) are evaluated at each site and transferred back to the local site. Combined with L1(β-), 2L1(β-), and L1(β), we obtain the second-order surrogate function L2(β) and the ODAL2 estimator is obtained by maximizing L2β.
Figure 2.
Figure 2.
Design of the simulation study. 1) Data are generated from a logistic regression with covariates X1, X2, X3, and X4; 2) Setting A considers the case in which the local sample size is fixed at 1000. The number of sites K is growing from 2 to 100; and 3) Setting B considers the case where total sample size is fixed at 10 000 and there are 10 sites in the network. The sample size in the local site grows from 100 to 9100.
Figure 3.
Figure 3.
Design of the real data evaluation. Patients with normal pregnancy and fetal loss are identified from the University of Pennsylvania Health System(UPHS) database and randomly divided into 10 sites. The local site has 10% of the data and the other 9 sites randomly split the rest of the data. Local estimator is conducted using data from the local site. The first-order one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL1), ODAL2, and GLORE (Grid binary LOgistic Regression) are performed using the distributed data. The pooled analysis is performed using the whole fetal loss dataset.
Figure 4.
Figure 4.
Relative biases and ratio of standard errors of the local estimator, first-order one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL1), ODAL2, and GLORE (Grid binary LOgistic Regression) compared with the POOLED estimator under settings A and B.
Figure 5.
Figure 5.
Odds ratio estimates from the first-order one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL1), ODAL2, POOLED (identical to the estimates from GLORE [Grid binary LOgistic Regression]), and the local estimators for 100 medications and their associations with fetal loss. The 100 medications from left to right are sorted in descending order by their odds ratio, which was estimated from the pooled estimator. The list of drug name and estimations can be found in Supplementary Table S1. We zoom in on 10 drugs with odds ratios near 1 in the highlighted box.
Figure 6.
Figure 6.
(Left panel) Point estimates and confidence intervals of odds ratios estimated from the first-order one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL1), ODAL2, POOLED, and local estimators for top 10 medications positively associated with fetal loss. (Right panel) Point estimates and confidence intervals of odds ratios estimated from ODAL1, ODAL2, POOLED, and local estimators for top 10 medications negatively associated with fetal loss. The dashed gray line indicates an odds ratio of 1, indicating no difference in risk from that expected by chance.

References

    1. Torda P, Han ES, Scholle SH.. Easing the adoption and use of electronic health records in small practices. Health Aff (Millwood) 2010; 29 (4): 668–75. - PubMed
    1. Decker SL, Jamoom EW, Sisk JE.. Physicians in nonprimary care and small practices and those age 55 and older lag in adopting electronic health record systems. Health Aff (Millwood) 2012; 31 (5): 1108–14. - PubMed
    1. Hripcsak G, Ryan PB, Duke JD, et al. Characterizing treatment pathways at scale using the OHDSI network. Proc Natl Acad Sci U S A 2016; 113 (27): 7329–36. - PMC - PubMed
    1. Boland MR, Parhi P, Li L, et al. Uncovering exposures responsible for birth season–disease effects: a global study. J Am Med Inform Assoc 2018; 25 (3): 275–88. - PMC - PubMed
    1. Friedman CP, Wong AK, Blumenthal D.. Achieving a nationwide learning health system. Sci Transl Med 2010; 2 (57): 57cm29. - PubMed

Publication types