Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 10;5(1):vbaf046.
doi: 10.1093/bioadv/vbaf046. eCollection 2025.

DataSHIELD: mitigating disclosure risk in a multi-site federated analysis platform

Affiliations

DataSHIELD: mitigating disclosure risk in a multi-site federated analysis platform

Demetris Avraam et al. Bioinform Adv. .

Abstract

Motivation: The validity of epidemiologic findings can be increased using triangulation, i.e. comparison of findings across contexts, and by having sufficiently large amounts of relevant data to analyse. However, access to data is often constrained by practical considerations and by ethico-legal and data governance restrictions. Gaining access to such data can be time-consuming due to the governance requirements associated with data access requests to institutions in different jurisdictions.

Results: DataSHIELD is a software solution that enables remote analysis without the need for data transfer (federated analysis). DataSHIELD is a scientifically mature, open-source data access and analysis platform aligned with the 'Five Safes' framework, the international framework governing safe research access to data. It allows real-time analysis while mitigating disclosure risk through an active multi-layer system of disclosure-preventing mechanisms. This combination of real-time remote statistical analysis, disclosure prevention mechanisms, and federation capabilities makes DataSHIELD a solution for addressing many of the technical and regulatory challenges in performing the large-scale statistical analysis of health and biomedical data. This paper describes the key components that comprise the disclosure protection system of DataSHIELD. These broadly fall into three classes: (i) system protection elements, (ii) analysis protection elements, and (iii) governance protection elements.

Availability and implementation: Information about the DataSHIELD software is available in https://datashield.org/ and https://github.com/datashield.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Schematic diagram showing the key DataSHIELD system protection elements.
Figure 2.
Figure 2.
Schematic diagram showing the invocation flow where different disclosure checks and controls are applied during a statistical analysis process in DataSHIELD. Note that this Figure shows a simplified diagram of the invocation flow between the client and a single server. In a multi-site setting, the same flow is applied simultaneously in multiple servers.
Figure 3.
Figure 3.
An illustration of the alignment of DataSHIELD with the Five Safes Framework.

References

    1. Austin C. 2020. The Open Science Ecosystem: A Systematic Framework Anchored in Values, Ethics and FAIRER Data. https://ssrn.com/abstract=3654298 (July 2024, date last accessed).
    1. Avraam D, Wilson R, Butters O et al. Privacy preserving data visualizations. EPJ Data Sci 2021;10:1–34. - PMC - PubMed
    1. Bamber D, Collins HE, Powell C et al. Development of a data classification system for preterm birth cohort studies: the RECAP Preterm project. BMC Med Res Methodol 2022;22:8. - PMC - PubMed
    1. Banerjee S, Bishop T. dsSynthetic: synthetic data generation for the DataSHIELD federated analysis system. BMC Res Notes 2022;15:230. - PMC - PubMed
    1. Banerjee S, Sofack GN, Papakonstantinou T et al. dsSurvival: privacy preserving survival models for federated individual patient meta-analysis in DataSHIELD. BMC Res Notes 2022;15:197. - PMC - PubMed

LinkOut - more resources