. 2020 Mar 1;27(3):343-354.

doi: 10.1093/jamia/ocz214.

Privacy-preserving model learning on a blockchain network-of-networks

Tsung-Ting Kuo¹, Jihoon Kim¹, Rodney A Gabriel^{1

2}

Affiliations

¹ UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA.
² Department of Anesthesiology, University of California San Diego, San Diego, California, USA.

PMID: 31943009
PMCID: PMC7025358
DOI: 10.1093/jamia/ocz214

Privacy-preserving model learning on a blockchain network-of-networks

Tsung-Ting Kuo et al. J Am Med Inform Assoc. 2020.

. 2020 Mar 1;27(3):343-354.

doi: 10.1093/jamia/ocz214.

Authors

Tsung-Ting Kuo¹, Jihoon Kim¹, Rodney A Gabriel^{1

2}

Affiliations

¹ UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA.
² Department of Anesthesiology, University of California San Diego, San Diego, California, USA.

PMID: 31943009
PMCID: PMC7025358
DOI: 10.1093/jamia/ocz214

Abstract

Objective: To facilitate clinical/genomic/biomedical research, constructing generalizable predictive models using cross-institutional methods while protecting privacy is imperative. However, state-of-the-art methods assume a "flattened" topology, while real-world research networks may consist of "network-of-networks" which can imply practical issues including training on small data for rare diseases/conditions, prioritizing locally trained models, and maintaining models for each level of the hierarchy. In this study, we focus on developing a hierarchical approach to inherit the benefits of the privacy-preserving methods, retain the advantages of adopting blockchain, and address practical concerns on a research network-of-networks.

Materials and methods: We propose a framework to combine level-wise model learning, blockchain-based model dissemination, and a novel hierarchical consensus algorithm for model ensemble. We developed an example implementation HierarchicalChain (hierarchical privacy-preserving modeling on blockchain), evaluated it on 3 healthcare/genomic datasets, as well as compared its predictive correctness, learning iteration, and execution time with a state-of-the-art method designed for flattened network topology.

Results: HierarchicalChain improves the predictive correctness for small training datasets and provides comparable correctness results with the competing method with higher learning iteration and similar per-iteration execution time, inherits the benefits of the privacy-preserving learning and advantages of blockchain technology, and immutable records models for each level.

Discussion: HierarchicalChain is independent of the core privacy-preserving learning method, as well as of the underlying blockchain platform. Further studies are warranted for various types of network topology, complex data, and privacy concerns.

Conclusion: We demonstrated the potential of utilizing the information from the hierarchical network-of-networks topology to improve prediction.

Keywords: blockchain distributed ledger technology; clinical information systems; decision support systems; hierarchical network; privacy-preserving predictive modeling.

PubMed Disclaimer

Figures

**Figure 1.**
Comparison of privacy-preserving learning methods on different network topologies. A. The participating sites in a flattened network topology, which is a fully-connected network. The number indicates the size of the records in the database at each site. For a smaller site (eg, s₃), the number of records may not be enough to train a generalizable predictive model, however the direct exchange of data is not preferred due to privacy considerations. B. The centralized learning methods can build a global model by exchanging the models instead of the data on a flattened network. However, they may have risk concerns such as single point of control, mutable data/records, change provenance, and partial visibility.C. The decentralized methods on a flattened network can address the abovementioned privacy risks by having no single point of control, immutable data/records, data provenance, and complete visibility.D. The real-world network-of-networks topology which may contain practical issues such as (1) data size may be small for rare diseases/conditions, (2) each site may prefer to prioritize their local data while considering the data size, and (3) each subnetwork may prefer to retain their own models. E. The proposed hierarchical learning method exploiting the network-of-networks information, which is not fully utilized by the decentralized learning methods designed for a flattened network, to address the practical issues. Specifically, by computing, recording, and combining the models from each level with different weights based on data size, the hierarchical method aims at (1) improving predictive correctness with small data (eg, s₁), (2) prioritizing local data for each site (eg, s₃), and (3) retaining consensus for each subnetwork (eg, *Level 2*). It also inherits the advantages of the decentralized method designed for a flattened network.

**Figure 2.**
Hierarchical consensus learning. Suppose this 3-level hierarchical network-of-networks consists of 4 sites (*Level 1*) from 2 subnetworks (*SCANNER* and *UCReX* at *Level 2*) of an overarching network (*pSCANNER* at *Level 3*), and we would like to predict a new outcome for site s₁. After the consensus models are learned at each level, we first stored all models (7 in this example), used each of the models to predict the score for the new record (in the test data on site s₁), collected the prediction scores for the new record, and then combined the scores using weighted-average method based on the size of the training data.

**Figure 3.**
Example of block, transaction, and transaction metadata of HierarchicalChain. The predictive model and related information are stored in the transaction metadata (eg, Metadata of Transaction *T₁₁*). The 4 red fields (“*Hierarchy*,” “*Record*,” “*Level*,” and “*Type*”) incorporate the newly added hierarchical information for HierarchicalChain compared to GloreChain. The details of the data fields are described in Table 1.

**Figure 4.**
Examples of the ensemble methods adopted in the *Proof-of-Hierarchy (PoH)* algorithm. A. Horizontal ensemble. For each of the new patient records at *SCANNER Site s₁*, we first identify all *Level 1* sites (ie, *SCANNER Site s₁*, *SCANNER Site s₂*, *UCReX Site s₃*, and *UCReX Site s₄*). The prediction scores from each *Level 1* models (ie, *Score_{1_1}*, *Score_{1_2}*, *Score_{1_3}*, and *Score_{1_4}*) are then combined using weighted-average with the training data sizes of each site (ie, 10, 30, 40, and 20 for *SCANNER Site s₁*, *SCANNER Site s₂*, *UCReX Site s₃*, and *UCReX Site s₄*, respectively) as the weights. B. Vertical ensemble. For each of the new patient records at *SCANNER Site s₁*, we first identify the levels related to *SCANNER Site s₁*, including *SCANNER Site s₁* itself (*Level 1*), *SCANNER* (*Level 2*), and *pSCANNER* (*Level 3*). Then, the prediction scores from the models of each level (ie, *Score_{1_1}*, *Score_{2_1}*, and *Score_{3_1}*) are then combined using weighted-average with the training data sizes of each level of the hierarchy (ie, 10, 40, and *100*, for *SCANNER Site s₁*, *SCANNER*, and *pSCANNER*, respectively) as the weights.

**Figure 5.**
System architecture of HierarchicalChain which contains 4 participating sites. The *Blockchain-Connector* component connects the main HierarchicalChain software to the underlying blockchain platform (MultiChain^, in our implementation). Abbreviations: AWS, Amazon Web Services;^, iDASH, integrating Data for Analysis, Anonymization, and Sharing.^,

**Figure 6.**
The results on data with different training data ratio, including 3 datasets (Edin, CA, and THA) as well as 2 data-splitting methods (balanced and imbalanced). We compared 2 ensemble methods (horizontal and ensemble) of HierarchicalChain with the state-of-the-art GloreChain. The data are split to balanced or imbalanced ratios among the sites. A. The predictive correctness results on small training data. The top header represents dataset name (data split ratio). The models are trained using only small portions of the training data. The evaluation metrics is the weighted-average AUC and the P values are computed using the Wilcoxon signed-rank test. B. Prediction correctness, measured in weighted-average test AUC for different training data ratio. C. Learning iterations for different training data ratios. D. Per-iteration execution time measured in seconds for different training data ratios.

See this image and copyright information in PMC

Cited by

Distributed cross-learning for equitable federated models - privacy-preserving prediction on data from five California hospitals.
Kuo TT, Gabriel RA, Koola J, Schooley RT, Ohno-Machado L. Kuo TT, et al. Nat Commun. 2025 Feb 5;16(1):1371. doi: 10.1038/s41467-025-56510-9. Nat Commun. 2025. PMID: 39910076 Free PMC article.
CertificateChain: decentralized healthcare training certificate management system using blockchain and smart contracts.
Tellew J, Kuo TT. Tellew J, et al. JAMIA Open. 2022 Mar 14;5(1):ooac019. doi: 10.1093/jamiaopen/ooac019. eCollection 2022 Apr. JAMIA Open. 2022. PMID: 35571362 Free PMC article.
Innovation is key for advancing the science of biomedical and health informatics and for publishing in JAMIA.
Bakken S. Bakken S. J Am Med Inform Assoc. 2020 Mar 1;27(3):341-342. doi: 10.1093/jamia/ocaa002. J Am Med Inform Assoc. 2020. PMID: 32055863 Free PMC article. No abstract available.
Functional genomics data: privacy risk assessment and technological mitigation.
Gürsoy G, Li T, Liu S, Ni E, Brannon CM, Gerstein MB. Gürsoy G, et al. Nat Rev Genet. 2022 Apr;23(4):245-258. doi: 10.1038/s41576-021-00428-7. Epub 2021 Nov 10. Nat Rev Genet. 2022. PMID: 34759381 Review.
Benchmarking blockchain-based gene-drug interaction data sharing methods: A case study from the iDASH 2019 secure genome analysis competition blockchain track.
Kuo TT, Bath T, Ma S, Pattengale N, Yang M, Cao Y, Hudson CM, Kim J, Post K, Xiong L, Ohno-Machado L. Kuo TT, et al. Int J Med Inform. 2021 Oct;154:104559. doi: 10.1016/j.ijmedinf.2021.104559. Epub 2021 Aug 18. Int J Med Inform. 2021. PMID: 34474309 Free PMC article.

See all "Cited by" articles

References

1. Navathe AS, Conway PH.. Optimizing health information technology's role in enabling comparative effectiveness research. Am J Managed Care 2010; 16 (12 Suppl HIT): SP44–7. - PubMed
1. Wicks P, Vaughan TE, Massagli MP, Heywood J.. Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm. Nat Biotechnol 2011; 29 (5): 411–4. - PubMed
1. Grossman JM, Kushner KL, November EA, Lthpolicy PC.. Creating Sustainable Local Health Information Exchanges: Can Barriers to Stakeholder Participation Be Overcome? Washington, DC: Center for Studying Health System Change; 2008. - PubMed
1. ClinVar. https://www.ncbi.nlm.nih.gov/clinvar/. Accessed June 1, 2017.
1. Landrum MJ, Lee JM, Benson M, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 2016; 44 (D1): D862–68 - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Privacy-preserving model learning on a blockchain network-of-networks

Affiliations

Privacy-preserving model learning on a blockchain network-of-networks

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources