A Privacy-Preserving Distributed Analytics Platform for Health Care Data
- PMID: 35038764
- PMCID: PMC9246511
- DOI: 10.1055/s-0041-1740564
A Privacy-Preserving Distributed Analytics Platform for Health Care Data
Abstract
Background: In recent years, data-driven medicine has gained increasing importance in terms of diagnosis, treatment, and research due to the exponential growth of health care data. However, data protection regulations prohibit data centralisation for analysis purposes because of potential privacy risks like the accidental disclosure of data to third parties. Therefore, alternative data usage policies, which comply with present privacy guidelines, are of particular interest.
Objective: We aim to enable analyses on sensitive patient data by simultaneously complying with local data protection regulations using an approach called the Personal Health Train (PHT), which is a paradigm that utilises distributed analytics (DA) methods. The main principle of the PHT is that the analytical task is brought to the data provider and the data instances remain in their original location.
Methods: In this work, we present our implementation of the PHT paradigm, which preserves the sovereignty and autonomy of the data providers and operates with a limited number of communication channels. We further conduct a DA use case on data stored in three different and distributed data providers.
Results: We show that our infrastructure enables the training of data models based on distributed data sources.
Conclusion: Our work presents the capabilities of DA infrastructures in the health care sector, which lower the regulatory obstacles of sharing patient data. We further demonstrate its ability to fuel medical science by making distributed data sets available for scientists or health care practitioners.
The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Conflict of interest statement
None declared.
Figures







Comment in
-
Security and Privacy in Distributed Health Care Environments.Methods Inf Med. 2022 May;61(1-02):1-2. doi: 10.1055/a-1768-2966. Epub 2022 Feb 10. Methods Inf Med. 2022. PMID: 35144306 Free PMC article. No abstract available.
References
-
- Das A, Upadhyaya I, Meng X.Collaborative filtering as a case-study for model parallelism on bulk synchronous systemsIn: ACM Conference on Information and Knowledge Management - CIKM '17. New York, New York, USA: ACM Press;2017969–977.
-
- McMahan B, Moore E, Ramage D. PMLR; 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data; pp. 1273–1282.
-
- Su H, Chen H.Experiments on parallel training of deep neural network using model averaging 2015. ArXiv: 1507.01239