If these data could talk
- PMID: 28872630
- PMCID: PMC5584398
- DOI: 10.1038/sdata.2017.114
If these data could talk
Abstract
In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientific fields exhibit distressingly low rates of reproducibility. Although there are many dimensions to this issue, we believe that there is a lack of formalism used when describing end-to-end published results, from the data source to the analysis to the final published results. Even when authors do their best to make their research and data accessible, this lack of formalism reduces the clarity and efficiency of reporting, which contributes to issues of reproducibility. Data provenance aids both reproducibility through systematic and formal records of the relationships among data sources, processes, datasets, publications and researchers.
Conflict of interest statement
The authors declare no competing financial interests.
Figures


References
-
- Baker M. & Dolgin E. Cancer reproducibility project releases first results. Nature 541, 269–270 (2017). - PubMed
-
- Leek J. T. & Jager L. R. Is most published research really false? Annu Rev Stat Appl 4, 109–122 (2017).
-
- Sarewitz D. The pressure to publish pushes down quality. Nature 533, 147–147 (2016). - PubMed
-
- Ellison A. M. et al. An analytic web to support the analysis and synthesis of ecological data. Ecology 87, 1345–1358 (2006). - PubMed
LinkOut - more resources
Full Text Sources
Other Literature Sources