Cloud-based biomedical data storage and analysis for genomic research: Landscape analysis of data governance in emerging NIH-supported platforms
- PMID: 37181330
- PMCID: PMC10173774
- DOI: 10.1016/j.xhgg.2023.100196
Cloud-based biomedical data storage and analysis for genomic research: Landscape analysis of data governance in emerging NIH-supported platforms
Abstract
The storage, sharing, and analysis of genomic data poses technical and logistical challenges that have precipitated the development of cloud-based computing platforms designed to facilitate collaboration and maximize the scientific utility of data. To understand cloud platforms' policies and procedures and the implications for different stakeholder groups, in summer 2021, we reviewed publicly available documents (N = 94) sourced from platform websites, scientific literature, and lay media for five NIH-funded cloud platforms (the All of Us Research Hub, NHGRI AnVIL, NHLBI BioData Catalyst, NCI Genomic Data Commons, and the Kids First Data Resource Center) and a pre-existing data sharing mechanism, dbGaP. Platform policies were compared across seven categories of data governance: data submission, data ingestion, user authentication and authorization, data security, data access, auditing, and sanctions. Our analysis finds similarities across the platforms, including reliance on a formal data ingestion process, multiple tiers of data access with varying user authentication and/or authorization requirements, platform and user data security measures, and auditing for inappropriate data use. Platforms differ in how data tiers are organized, as well as the specifics of user authentication and authorization across access tiers. Our analysis maps elements of data governance across emerging NIH-funded cloud platforms and as such provides a key resource for stakeholders seeking to understand and utilize data access and analysis options across platforms and to surface aspects of governance that may require harmonization to achieve the desired interoperability.
Keywords: cloud platforms; data governance; data sharing; genomic databases.
© 2023 The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures
Similar articles
-
Building a collaborative cloud platform to accelerate heart, lung, blood, and sleep research.J Am Med Inform Assoc. 2023 Jun 20;30(7):1293-1300. doi: 10.1093/jamia/ocad048. J Am Med Inform Assoc. 2023. PMID: 37192819 Free PMC article.
-
Cloud bursting galaxy: federated identity and access management.Bioinformatics. 2020 Jan 1;36(1):1-9. doi: 10.1093/bioinformatics/btz472. Bioinformatics. 2020. PMID: 31197310 Free PMC article.
-
Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space.Cell Genom. 2022 Jan 12;2(1):100085. doi: 10.1016/j.xgen.2021.100085. Epub 2022 Jan 13. Cell Genom. 2022. PMID: 35199087 Free PMC article.
-
Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data.Trends Genet. 2019 Mar;35(3):223-234. doi: 10.1016/j.tig.2018.12.006. Epub 2019 Jan 25. Trends Genet. 2019. PMID: 30691868 Free PMC article. Review.
-
Redefining governance: a critical analysis of sustainability transformation in e-governance.Front Big Data. 2024 Apr 3;7:1349116. doi: 10.3389/fdata.2024.1349116. eCollection 2024. Front Big Data. 2024. PMID: 38638340 Free PMC article. Review.
Cited by
-
Ethical governance for genomic data science in the cloud.Nat Rev Genet. 2025 Feb;26(2):75-77. doi: 10.1038/s41576-024-00789-9. Nat Rev Genet. 2025. PMID: 39424902 Free PMC article.
-
Data Sharing in the PRIMED Consortium: Design, implementation, and recommendations for future policymaking.ArXiv [Preprint]. 2025 Feb 12:arXiv:2502.09351v1. ArXiv. 2025. Update in: Am J Hum Genet. 2025 Aug 7;112(8):1754-1768. doi: 10.1016/j.ajhg.2025.06.004. PMID: 39990790 Free PMC article. Updated. Preprint.
-
Current landscape of cancer genomics research in sub-Saharan Africa - a review of literature.Front Oncol. 2025 Apr 17;15:1512005. doi: 10.3389/fonc.2025.1512005. eCollection 2025. Front Oncol. 2025. PMID: 40313245 Free PMC article. Review.
-
NIGMS Sandbox: a learning platform toward democratizing cloud computing for biomedical research.Brief Bioinform. 2024 Jul 23;25(Supplement_1):bbae478. doi: 10.1093/bib/bbae478. Brief Bioinform. 2024. PMID: 39376084
References
-
- Schatz M.C., Philippakis A.A., Afgan E., Banks E., Carey V.J., Carroll R.J., Culotti A., Ellrott K., Goecks J., Grossman R.L., et al. Inverting the model of genomics data sharing with the NHGRI genomic data science analysis, visualization, and Informatics lab-space. Cell Genom. 2021;2:100085. doi: 10.1101/2021.04.22.436044. - DOI - PMC - PubMed
-
- Broad Institute DUOS - Data Use Oversight System. https://duos.broadinstitute.org/
-
- Final NIH Policy for Data Management and Sharing. (2023). https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources