Reproducible big data science: A case study in continuous FAIRness
- PMID: 30973881
- PMCID: PMC6459504
- DOI: 10.1371/journal.pone.0213013
Reproducible big data science: A case study in continuous FAIRness
Erratum in
-
Correction: Reproducible big data science: A case study in continuous FAIRness.PLoS One. 2023 Nov 21;18(11):e0294883. doi: 10.1371/journal.pone.0294883. eCollection 2023. PLoS One. 2023. PMID: 37988378 Free PMC article.
Abstract
Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
as input. It executes from top to bottom, using subworkflows B and C to implement
and then subworkflow D to implement
. It produces as output BDBags containing aligned DNase-seq data and footprints, with the latter serving as input to
.References
-
- Hey T, Tansley S, Tolle KM. The fourth paradigm: Data-intensive scientific discovery. Microsoft research; Redmond, WA; 2009.
-
- Kitchin R. Big Data, new epistemologies and paradigm shifts. Big Data & Society. 2014;1(1):2053951714528481.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
