Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 1;27(3):343-354.
doi: 10.1093/jamia/ocz214.

Privacy-preserving model learning on a blockchain network-of-networks

Affiliations

Privacy-preserving model learning on a blockchain network-of-networks

Tsung-Ting Kuo et al. J Am Med Inform Assoc. .

Abstract

Objective: To facilitate clinical/genomic/biomedical research, constructing generalizable predictive models using cross-institutional methods while protecting privacy is imperative. However, state-of-the-art methods assume a "flattened" topology, while real-world research networks may consist of "network-of-networks" which can imply practical issues including training on small data for rare diseases/conditions, prioritizing locally trained models, and maintaining models for each level of the hierarchy. In this study, we focus on developing a hierarchical approach to inherit the benefits of the privacy-preserving methods, retain the advantages of adopting blockchain, and address practical concerns on a research network-of-networks.

Materials and methods: We propose a framework to combine level-wise model learning, blockchain-based model dissemination, and a novel hierarchical consensus algorithm for model ensemble. We developed an example implementation HierarchicalChain (hierarchical privacy-preserving modeling on blockchain), evaluated it on 3 healthcare/genomic datasets, as well as compared its predictive correctness, learning iteration, and execution time with a state-of-the-art method designed for flattened network topology.

Results: HierarchicalChain improves the predictive correctness for small training datasets and provides comparable correctness results with the competing method with higher learning iteration and similar per-iteration execution time, inherits the benefits of the privacy-preserving learning and advantages of blockchain technology, and immutable records models for each level.

Discussion: HierarchicalChain is independent of the core privacy-preserving learning method, as well as of the underlying blockchain platform. Further studies are warranted for various types of network topology, complex data, and privacy concerns.

Conclusion: We demonstrated the potential of utilizing the information from the hierarchical network-of-networks topology to improve prediction.

Keywords: blockchain distributed ledger technology; clinical information systems; decision support systems; hierarchical network; privacy-preserving predictive modeling.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Comparison of privacy-preserving learning methods on different network topologies. A. The participating sites in a flattened network topology, which is a fully-connected network. The number indicates the size of the records in the database at each site. For a smaller site (eg, s3), the number of records may not be enough to train a generalizable predictive model, however the direct exchange of data is not preferred due to privacy considerations. B. The centralized learning methods can build a global model by exchanging the models instead of the data on a flattened network. However, they may have risk concerns such as single point of control, mutable data/records, change provenance, and partial visibility.C. The decentralized methods on a flattened network can address the abovementioned privacy risks by having no single point of control, immutable data/records, data provenance, and complete visibility.D. The real-world network-of-networks topology which may contain practical issues such as (1) data size may be small for rare diseases/conditions, (2) each site may prefer to prioritize their local data while considering the data size, and (3) each subnetwork may prefer to retain their own models. E. The proposed hierarchical learning method exploiting the network-of-networks information, which is not fully utilized by the decentralized learning methods designed for a flattened network, to address the practical issues. Specifically, by computing, recording, and combining the models from each level with different weights based on data size, the hierarchical method aims at (1) improving predictive correctness with small data (eg, s1), (2) prioritizing local data for each site (eg, s3), and (3) retaining consensus for each subnetwork (eg, Level 2). It also inherits the advantages of the decentralized method designed for a flattened network.
Figure 2.
Figure 2.
Hierarchical consensus learning. Suppose this 3-level hierarchical network-of-networks consists of 4 sites (Level 1) from 2 subnetworks (SCANNER and UCReX at Level 2) of an overarching network (pSCANNER at Level 3), and we would like to predict a new outcome for site s1. After the consensus models are learned at each level, we first stored all models (7 in this example), used each of the models to predict the score for the new record (in the test data on site s1), collected the prediction scores for the new record, and then combined the scores using weighted-average method based on the size of the training data.
Figure 3.
Figure 3.
Example of block, transaction, and transaction metadata of HierarchicalChain. The predictive model and related information are stored in the transaction metadata (eg, Metadata of Transaction T11). The 4 red fields (“Hierarchy,” “Record,” “Level,” and “Type”) incorporate the newly added hierarchical information for HierarchicalChain compared to GloreChain. The details of the data fields are described in Table 1.
Figure 4.
Figure 4.
Examples of the ensemble methods adopted in the Proof-of-Hierarchy (PoH) algorithm. A. Horizontal ensemble. For each of the new patient records at SCANNER Site s1, we first identify all Level 1 sites (ie, SCANNER Site s1, SCANNER Site s2, UCReX Site s3, and UCReX Site s4). The prediction scores from each Level 1 models (ie, Score1_1, Score1_2, Score1_3, and Score1_4) are then combined using weighted-average with the training data sizes of each site (ie, 10, 30, 40, and 20 for SCANNER Site s1, SCANNER Site s2, UCReX Site s3, and UCReX Site s4, respectively) as the weights. B. Vertical ensemble. For each of the new patient records at SCANNER Site s1, we first identify the levels related to SCANNER Site s1, including SCANNER Site s1 itself (Level 1), SCANNER (Level 2), and pSCANNER (Level 3). Then, the prediction scores from the models of each level (ie, Score1_1, Score2_1, and Score3_1) are then combined using weighted-average with the training data sizes of each level of the hierarchy (ie, 10, 40, and 100, for SCANNER Site s1, SCANNER, and pSCANNER, respectively) as the weights.
Figure 5.
Figure 5.
System architecture of HierarchicalChain which contains 4 participating sites. The Blockchain-Connector component connects the main HierarchicalChain software to the underlying blockchain platform (MultiChain, in our implementation). Abbreviations: AWS, Amazon Web Services;, iDASH, integrating Data for Analysis, Anonymization, and Sharing.,
Figure 6.
Figure 6.
The results on data with different training data ratio, including 3 datasets (Edin, CA, and THA) as well as 2 data-splitting methods (balanced and imbalanced). We compared 2 ensemble methods (horizontal and ensemble) of HierarchicalChain with the state-of-the-art GloreChain. The data are split to balanced or imbalanced ratios among the sites. A. The predictive correctness results on small training data. The top header represents dataset name (data split ratio). The models are trained using only small portions of the training data. The evaluation metrics is the weighted-average AUC and the P values are computed using the Wilcoxon signed-rank test. B. Prediction correctness, measured in weighted-average test AUC for different training data ratio. C. Learning iterations for different training data ratios. D. Per-iteration execution time measured in seconds for different training data ratios.
Figure 6.
Figure 6.
The results on data with different training data ratio, including 3 datasets (Edin, CA, and THA) as well as 2 data-splitting methods (balanced and imbalanced). We compared 2 ensemble methods (horizontal and ensemble) of HierarchicalChain with the state-of-the-art GloreChain. The data are split to balanced or imbalanced ratios among the sites. A. The predictive correctness results on small training data. The top header represents dataset name (data split ratio). The models are trained using only small portions of the training data. The evaluation metrics is the weighted-average AUC and the P values are computed using the Wilcoxon signed-rank test. B. Prediction correctness, measured in weighted-average test AUC for different training data ratio. C. Learning iterations for different training data ratios. D. Per-iteration execution time measured in seconds for different training data ratios.

Similar articles

Cited by

References

    1. Navathe AS, Conway PH.. Optimizing health information technology's role in enabling comparative effectiveness research. Am J Managed Care 2010; 16 (12 Suppl HIT): SP44–7. - PubMed
    1. Wicks P, Vaughan TE, Massagli MP, Heywood J.. Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm. Nat Biotechnol 2011; 29 (5): 411–4. - PubMed
    1. Grossman JM, Kushner KL, November EA, Lthpolicy PC.. Creating Sustainable Local Health Information Exchanges: Can Barriers to Stakeholder Participation Be Overcome? Washington, DC: Center for Studying Health System Change; 2008. - PubMed
    1. ClinVar. https://www.ncbi.nlm.nih.gov/clinvar/. Accessed June 1, 2017.
    1. Landrum MJ, Lee JM, Benson M, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 2016; 44 (D1): D862–68 - PMC - PubMed

Publication types