Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 27;6(6):631-635.
doi: 10.1016/j.cels.2018.03.014.

Practical Computational Reproducibility in the Life Sciences

Affiliations

Practical Computational Reproducibility in the Life Sciences

Björn Grüning et al. Cell Syst. .

Abstract

Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components-a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines-to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS

Authors declare no competing financial interests.

Figures

Figure 1.
Figure 1.
Software Stack of Interconnected Technologies that Enables Computational Reproducibility It uses an example of the most basic RNA-seq analysis involving four tools. Our stack includes three components: (1) the cross-platform package manager Conda (https://conda.io) for installing analysis tools across operating systems, including virtualized environments that include all tools and dependencies at specified versions for performing a computational analysis, (2) lightweight software containers, such as Docker or Singularity, for using virtual environments and tool installations across different computing clusters, both local and in the cloud, and (3) hardware virtualization to achieve complete isolation and reproducibility. We have implemented this stack in the Galaxy scientific workbench (https://galaxyproject.org), enabling any Galaxy server to easily and automatically install all requirements for each Galaxy analysis workflow. This stack is also integrated into the CWL reference implementation. Integration of our reproducibility stack into Galaxy and CWL demonstrates, for the first time, how analysis workflows can be shared, rerun, and reproduced across platforms with no manual setup.

References

    1. Afgan E, Baker D, Coraor N, Chapman B, Nekrutenko A, and Taylor J (2010). Galaxy CloudMan: delivering cloud compute clusters. BMC Bioinformatics 11 (Suppl 12), S4. - PMC - PubMed
    1. Afgan E, Baker D, van den Beek M, Blankenberg D, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Eberhard C, et al. (2016). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44 (W1), W3–W10. - PMC - PubMed
    1. Baker M (2016). 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454. - PubMed
    1. Baumer B, Cetinkaya-Rundel M, Bray A, Loi L, and Horton NJ (2014). R Markdown: Inte grating A Reproducible Analysis Tool into Introductory Statistics. arXiv, arXiv:1402.1894.
    1. Beaulieu-Jones BK, and Greene CS (2017). Reproducibility of computational workflows is automated using continuous analysis. Nat. Biotechnol 35, 342–346. - PMC - PubMed

Publication types

LinkOut - more resources