Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 1;8(5):giz035.
doi: 10.1093/gigascience/giz035.

Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R

Affiliations

Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R

Daniel S Falster et al. Gigascience. .

Abstract

The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets- those that continue to grow with time as more records are added, errors fixed, and new data structures are created. In this article, we describe a workflow for maintaining and distributing successive versions of an evolving dataset, allowing users to retrieve and load different versions directly into the R platform. Our workflow utilizes tools and platforms used for development and distribution of successive versions of an open source software program, including version control, GitHub, and semantic versioning, and applies these to the analogous process of developing successive versions of an open source dataset. Moreover, we argue that this model allows for individual research groups to achieve a dynamic and versioned model of data delivery at no cost.

Keywords: data sharing; semantic versioning; version control.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the workflow, different parties, and technologies involved in maintaining and distributing versions of an evolving dataset via datastorr. Core features of our approach are shown with black boxes and arrows. Optional extensions are shown in grey (see Discussion for details).
Figure 2
Figure 2
Semantic versioning allows dataset developers to communicate to users the types of changes that have occurred between successive versions of an evolving dataset, using a tri-digit label where increments in a number indicate major, minor, and patch-level changes, respectively. See text for further details.

References

    1. Whitlock MC. Data archiving in ecology and evolution: best practices. Trends Ecol Evol. 2011;26:61–5. - PubMed
    1. Fairbairn DJ. The advent of mandatory data archiving. Evolution. 2011; 65:1–2. - PubMed
    1. Piwowar HA, Vision TJ, Whitlock MC. Data archiving is a good investment. Nature. 2011;473(7347):285. - PubMed
    1. Van Noorden R. Data-sharing: everything on display. Nature. 2013;500(7641):243–5. - PubMed
    1. Gibney E, Van Noorden R. Scientists losing data at a rapid rate. Nat News. 2013,doi:10.1038/nature.2013.14416. - DOI

Publication types