Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data
- PMID: 30691868
- PMCID: PMC6474403
- DOI: 10.1016/j.tig.2018.12.006
Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data
Abstract
Data commons collate data with cloud computing infrastructure and commonly used software services, tools, and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize, and share large-scale genomics datasets. Data ecosystems can be built by interoperating multiple data commons. It can be quite labor intensive to curate, import, and analyze the data in a data commons. Data lakes provide an alternative to data commons and simply provide access to data, with the data curation and analysis deferred until later and delegated to those that access the data. We review software platforms for managing, analyzing, and sharing genomic data, with an emphasis on data commons, but also cover data ecosystems and data lakes.
Keywords: cancer genomics clouds; data clouds; data commons; data sharing.
Copyright © 2019 The Author. Published by Elsevier Ltd.. All rights reserved.
Figures
References
-
- Rozenblatt-Rosen O et al. (2017) The Human Cell Atlas: from vision to reality. Nature 550, 451–453 - PubMed
-
- Council NR (2011) Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease, The National Academies Press - PubMed
-
- Panel BR (2016) Cancer Moonshot Blue Ribbon Panel Report. https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative (accessed 2018)
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
