Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 31;26(4):106546.
doi: 10.1016/j.isci.2023.106546. eCollection 2023 Apr 21.

A five-safes approach to a secure and scalable genomics data repository

Affiliations

A five-safes approach to a secure and scalable genomics data repository

Chih Chuan Shih et al. iScience. .

Abstract

Genomic researchers increasingly utilize commercial cloud service providers (CSPs) to manage data and analytics needs. CSPs allow researchers to grow Information Technology (IT) infrastructure on demand to overcome bottlenecks when combining large datasets. However, without adequate security controls, the risk of unauthorized access may be higher for data stored on the cloud. Additionally, regulators are mandating data access patterns and specific security protocols for the storage and use of genomic data. While CSP provides tools for security and regulatory compliance, building the necessary controls required for cloud solutions is not trivial. Research Assets Provisioning and Tracking Online Repository (RAPTOR) by the Genome Institute of Singapore is a cloud-native genomics data repository and analytics platform that implements a "five-safes" framework to provide security and governance controls to data contributors and users, leveraging CSP for sharing and analysis of genomic datasets without the risk of security breaches or running afoul of regulations.

Keywords: Data encryption; Data storage representation; Genomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
RAPTOR Overview and the modes of data analysis supported RAPTOR provides access via Standalone Linux machines with sudo access, Juypter Notebooks, and EMR Cluster with pre-configured Hail tools.
Figure 2
Figure 2
As a serverless application, RAPTOR is composed of native AWS services integrated together with Lambda functions User Interfaces are composed of CloudFront hosting graphical user interfaces made with Java scripts. User authentications are managed with Cognito. Hosted datasets sit on S3 (with automated tiering) while all metadata are stored on DynamoDB. Data-staging activities are managed using S3 Batch. Data ingress and egress are managed through TransferFamily. The Analytics workspace relies on FSX to provide scratch storage, and depending on the mode of compute, either EC2, Elastic Map Reduce or Parallel Cluster will provide computing power. Data access from the nodes is regulated by Service Endpoints. All permissions and authorisations are managed using IAM. Encryption keys used by S3, EBS, and DynamoDB are stored within KMS. All RAPTOR activities are written into AWS QLDB.
Figure 3
Figure 3
Using existing GA4GH DRS and TES for federated computation on RAPTOR (i) A client invokes Get to describe an existing data collection ‘C’ on RAPTOR. In addition to content and access descriptions, RAPTOR also sends identity of the AMI associated with C under ‘alias’. The provided information includes AMI id, pre-installed tools, mount points for accessing data, and the customized IAM role and endpoint IP; these are used by RAPTOR to read and write data from the remote client. (ii) The remote client invokes a task using TES. In the call, inputs are used for the client to inform RAPTOR which URI is to be mapped to which mount point. Under executors, the remote client will inform RAPTOR which AMI is to be used. (iii) Batch instantiates an EC2 machine AMI and parameters provided by TES task. The machine will mount paths from POAG RAPTOR and SG10K RAPTOR within the same machine using the customized IAM role.

References

    1. UKBiobank Platform vastly increases the scale and accessibility of the world’s most comprehensive biomedical database. https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/news/uk-biobank-...
    1. Speedtest Global Index Internet speed around the world Speedtest Global Index. https://www.speedtest.net/global-index
    1. Amazon Web Services AWS Well-Architected Framework. https://wa.aws.amazon.com/wellarchitected/2020-07-02T19-33-23/index.en.html
    1. Google Cloud Google cloud architecture framework. https://cloud.google.com/architecture/framework
    1. Azure Architecture Center Microsoft Azure well-architected framework. https://docs.microsoft.com/en-us/azure/architecture/framework/

LinkOut - more resources