Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 20;4(2):1134.
doi: 10.23889/ijpds.v4i2.1134.

A Profile of the SAIL Databank on the UK Secure Research Platform

Affiliations

A Profile of the SAIL Databank on the UK Secure Research Platform

K H Jones et al. Int J Popul Data Sci. .

Abstract

Background: The Secure Anonymised Information Linkage (SAIL) Databank is a national data safe haven of de identified datasets principally about the population of Wales, made available in anonymised form to researchers across the world. It was established to enable the vast arrays of data collected about individuals in the course of health and other public service delivery to be made available to answer important questions that could not otherwise be addressed without prohibitive effort. The SAIL Databank is the bedrock of other funded centres relying on the data for research.

Approach: SAIL is a data repository surrounded by a suite of physical, technical and procedural control measures embodying a proportionate privacy-by-design governance model, informed by public engagement, to safeguard the data and facilitate data utility. SAIL operates on the UK Secure Research Platform (SeRP), which is a customisable technology and analysis platform. Researchers access anonymised data via this secure research environment, from which results can be released following scrutiny for disclosure risk. SAIL data are being used in multiple research areas to evaluate the impact of health and social exposures and policy interventions.

Discussion: Lessons learned and their applications include: managing evolving legislative and regulatory requirements; employing multiple, tiered security mechanisms; working hard to increase analytical capacity efficiency; and developing a multi-faceted programme of public engagement. Further work includes: incorporating new data types; enabling alternative means of data access; and developing further efficiencies across our operations.

Conclusion: SAIL represents an ongoing programme of work to develop and maintain an extensive, whole population data resource for research. Its privacy-by-design model and UK SeRP technology have received international acclaim, and we continually endeavour to demonstrate trustworthiness to support data provider assurance and public acceptability in data use. We strive for further improvement and continue a mutual learning process with our contemporaries in this rapidly developing field.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: The authors declare that they have no known conflicts of interest.

Figures

Figure 1: The SAIL Secure Research Platform
Figure 1: The SAIL Secure Research Platform
SAIL operates on a secure research platform (UK SeRP). Beginning at the left of the diagram, wherever researchers are based, they access data through a provisioned, secure, research ready desktop using VMware Horizon infrastructure. The connection from the user’s terminal to the desktop is strongly encrypted and access control prevents data being transferred outside the desktop environment. The end user is authenticated through both user credentials and two factor authentication tokens. Provisioned desktops come in a variety of capacities and configurations to suit the type of analysis that the end user and project needs. As part of the research environment there are shared project spaces to enable collaboration through database space, file store, wiki, Git (source control) as well as access to wider support and help materials. UK SeRP has many shared infrastructure components that can help deliver the programme’s objectives or specific project needs. SAIL uses IBM DB2 as its data warehouse due to the massively parallel processing (MPP) architecture and the ability to scale to suit the needs of such a large repository and the big data needs that this drives. To support specific project needs, other UK SeRP components can be made available, such as the HPC cluster or Kubernetes cluster to support processing pipelines, or GPU and AI cluster for training computing models. Through the provision of virtual machines or container environment, SAIL can support more complex methodological developments that require bespoke infrastructure to support development or deployment of tailored solutions. Business intelligence tools such as Tableau, R Shiny and PowerBI (not shown) are also available. Two other UK SeRP instances (Data Science Building projects (DSB) and Dementias Platform UK (DPUK)) are included on the diagram to help illustrate the customisability of the platform, since these will operate using different components, or other governance regimens to SAIL.
Figure 2: The National Research Data Appliance
Figure 2: The National Research Data Appliance
The various components of the National Research Data Appliance (NRDA) are shown. The entire UK SeRP environment is controlled by “Security 3 (S3)”, a feature of the National Research Data Appliance (NRDA) allowing the tenancy to be managed and controlled by non-technical team members. The user accounts and projects are defined and managed through a user interface (shown on the left) allowing different levels of access, even allowing project PI to self-manage project membership. (This self-management feature is not enabled for SAIL.) This system allows the infrastructure configuration, project configuration and governance structure to be documented, and all system components orchestrated, through the user interface. Which parts of the infrastructure are accessible and which projects within that environment are enacted in the particular tenancy model. The S3 model is periodically checked against the infrastructure and any nonconformity is corrected and reported.

Similar articles

Cited by

References

    1. Administrative Data Research Wales (2018) https://adrn.ac.uk/about/network/wales/
    1. Health Data Research UK (2019) Wales and Northern Ireland https://www.hdruk.ac.uk/about/structure/hdr-uk-wales-and-northern-ireland/
    1. McGrail KM, Jones KH, Akbari A, Bennett T, Boyd A, Carinci F, et al. (2018) A Position Statement on Population Data Science: The science of data about people, IJPDS, 3:1, 10.23889/ijpds.v3i1.415 - DOI - PMC - PubMed
    1. Academy of Medical Sciences (2016) Improving the health of the public by 2040; optimising the research environment for a healthier, fairer future. https://acmedsci.ac.uk/file-download/41399-5807581429f81.pdf
    1. Office of National Statistics (2018) Wales population mid-year estimate. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigrati...

LinkOut - more resources