Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 1;36(1):1-9.
doi: 10.1093/bioinformatics/btz472.

Cloud bursting galaxy: federated identity and access management

Affiliations

Cloud bursting galaxy: federated identity and access management

Vahid Jalili et al. Bioinformatics. .

Abstract

Motivation: Large biomedical datasets, such as those from genomics and imaging, are increasingly being stored on commercial and institutional cloud computing platforms. This is because cloud-scale computing resources, from robust backup to high-speed data transfer to scalable compute and storage, are needed to make these large datasets usable. However, one challenge for large-scale biomedical data on the cloud is providing secure access, especially when datasets are distributed across platforms. While there are open Web protocols for secure authentication and authorization, these protocols are not in wide use in bioinformatics and are difficult to use for even technologically sophisticated users.

Results: We have developed a generic and extensible approach for securely accessing biomedical datasets distributed across cloud computing platforms. Our approach combines OpenID Connect and OAuth2, best-practice Web protocols for authentication and authorization, together with Galaxy (https://galaxyproject.org), a web-based computational workbench used by thousands of scientists across the world. With our enhanced version of Galaxy, users can access and analyze data distributed across multiple cloud computing providers without any special knowledge of access/authorization protocols. Our approach does not require users to share permanent credentials (e.g. username, password, API key), instead relying on automatically generated temporary tokens that refresh as needed. Our approach is generalizable to most identity providers and cloud computing platforms. To the best of our knowledge, Galaxy is the only computational workbench where users can access biomedical datasets across multiple cloud computing platforms using best-practice Web security approaches and thereby minimize risks of unauthorized data access and credential use.

Availability and implementation: Freely available for academic and commercial use under the open-source Academic Free License (https://opensource.org/licenses/AFL-3.0) from the following Github repositories: https://github.com/galaxyproject/galaxy and https://github.com/galaxyproject/cloudauthz.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Galaxy adopts and integrates best-practice Web protocols to access secured data stored on cloud platforms (discussed in details in Section 2). In this approach, a resource owner shares protected data with collaborators (User) leveraging the role-based access model (Sandhu et al., 1996) and OpenID Connect protocol (OIDC). Accordingly, a resource owner defines a role with (read or write) access to protected data (e.g. see Fig. 3), and specifies a Galaxy instance (defined using OIDC audience ID) that can assume the role upon presenting the user’s identity token issued by their specified institute (OIDC IdP) for that Galaxy instance (e.g. see Fig. 4). Upon successfully assuming the role, Galaxy receives cloud-provider-specific temporary credentials, and uses them to sign API requests to protected data. Note that following the OIDC requirements, all the discussed communications are TLS-protected (see Section 3.1). Additionally, a resource owner and user are not required to belong to a same trust group (e.g. institute)
Fig. 2.
Fig. 2.
Galaxy has enabled users to login using their identities with a wide-range of identity providers, spanning from Google, Github, ORCID, ElixirAAI and Globus, to <3000 world-wide educational institutes. Accordingly, Galaxy leverages CILogon and Python Social Auth for users authentication, and these brokers interface with a number of (social and institutional) identity providers, and CILogin interfaces with eduGAIN that federates 60 nation-wide federations of educational identity providers. For instance, top-4 federations in terms of the IdPs they integrate are United Kingdom (UK federation), U.S. (InCommon), France (Fédération Éducation-Recherche) and Brazil (CAFe), and one institute per federation is highlighted in the figure. Additionally, Galaxy leverages CloudAuthz to obtain authorization to cloud-based resource providers, such as AWS, Azure and Google Cloud Platform
Fig. 3.
Fig. 3.
A sample of an AWS policy, which can be attached to a role to enable it to retrieve (‘Action’: ‘s3: GetObject’) all the objects in the bucket gxy-bucket1 and only the object hgmm_100_R2.fastq from bucket gxy-bucket2, if the request is made from a server with 1.2.3.4 IP address. Sid: statement ID
Fig. 4.
Fig. 4.
An example of a trust relationship defined for an AWS role, which allows a Galaxy instance, identified by the 8936…apps.googleusercontent.com (part of the client ID), to assume the role in exchange of a user’s ID token issued for that Galaxy instance by Google
Fig. 5.
Fig. 5.
Identity federation and authorization grant flow in the proposed method for AWS. The flow is a three-step procedure; first, a Galaxy admin registers the instance as an OIDC client with and OIDC IdP (e.g. Google), and a resource owner configures an AWS role that can be assumed by the Galaxy instance (specified using its OIDC client ID, see Fig. 4) to perform a certain operations on their resources (see Fig. 3). Second, the user logs in to Galaxy (either as a new user, or in association with their existing account) using their identity with the OIDC IdP with which the Galaxy instance is registered as a client (e.g. Google), and they define a cloud authorization record using the AWS role ARN the resource owner has shared with them. Third, Galaxy communicates with Amazon Secure Token Service (STS), presents all the necessary information to assume the role, and obtains access key, secret key and session token, which can be used to sign API requests to AWS resources. Note that a Galaxy user and a Google user can refer/belong to a same person; however, they are not necessarily the same identities, as a Galaxy user can be associated with multiple identities on different IdPs
Fig. 6.
Fig. 6.
Identity federation and authorization grant flow for Azure. Accordingly, a resource owner defines a service principle (SP) and assigns a role with necessary permissions (e.g. read a bucket) to it, and obtains its client ID and client secret, and shares them, along with tenant ID, with a Galaxy user. (A resource owner and a Galaxy user can potentially be the same person; however, they are not necessarily the same identities.) The user can then define a cloud authorization record in Galaxy using the tenant ID, client ID and client secret, which Galaxy can use to assume the role and obtain OAuth2.0 access token. Note that, this is the client credentials grant flow of OAuth 2.0 protocol that allows assuming a role using the aforementioned information only, and without needing for user’s OIDC identity token (unlike the flow presented for AWS in Fig. 5)
Fig. 7.
Fig. 7.
Illustrates three patterns of user authentication and cloud authorization. The Option 1 is based on direct authentication protocol, which we currently implement. The Option 2 is based on brokered authentication pattern, and since methods implementing this protocol can map an authenticated user to a local identity (see steps 4 and 5 of Option 2: the broker emits its own authentication instead of relaying the IdP’s proof), this protocol cannot be used for authorization grant to cloud-based resource providers. In other words, the broker can emit an identity (e.g. ‘Meryl85’) that is different from the identity expected by the resource provider (e.g. ‘Meryl’), resulting authorization failure by the resource provider (see step 7 of Option 2). The Option 3 also follows brokered authentication pattern, but since it also provides authorization grant service (Amazon Cognito is such a broker), it can be used as an alternative to Option 1
Fig. 8.
Fig. 8.
Illustrates a subset of available methods and implementations for user authentication and authorization grant to cloud-based resource providers, and the back-ends each method supports. The figure is scoped to only OIDC-based authentication and authorization grant using cloud-native credentials. The method and implementations we use in Galaxy are highlighted in purple, which are Python Social Auth for user authentication, and CloudAuthz for granting cloud authorization

Similar articles

Cited by

References

    1. Afgan E. et al. (2018a) Cloudlaunch: Discover and Deploy Cloud Applications. Future Generation Computer Systems. - PMC - PubMed
    1. Afgan E. et al. (2018b) Federated galaxy: Biomedical computing at the frontier. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD) IEEE, pp. 871–874. - PMC - PubMed
    1. Afgan E. et al. (2018c) The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res., 46, W537–W544. - PMC - PubMed
    1. Basney J. et al. (2014) Cilogon: a federated x. 509 certification authority for cyberinfrastructure logon. Concurr. Comput. Pract. Exp., 26, 2225–2239.
    1. Cabili M.N. et al. (2018) Simplifying research access to genomics and health data with library cards. Sci. Data, 5, 180039. - PMC - PubMed

Publication types