Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Jul 11;5(1):30.
doi: 10.1186/s13742-016-0135-4.

Tools and techniques for computational reproducibility

Affiliations
Review

Tools and techniques for computational reproducibility

Stephen R Piccolo et al. Gigascience. .

Abstract

When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

Keywords: Computational reproducibility; Literate programming; Practice of science; Software containers; Software frameworks; Virtualization.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Example of a command line script. This script can be used to align DNA sequence data to a reference genome. First, it downloads the software and data files necessary for the analysis. Then, it extracts (“unzips”) these files, and aligns the data to a reference genome for Ebola virus. Finally, it converts, sorts, and indexes the aligned data. See Additional file 1 for an executable version of this script
Fig. 2
Fig. 2
Example of a Make file. This file performs the same function as the command line script shown in Fig. 1, except that it is formatted for the Make utility. Accordingly, it is structured so that specific tasks must be executed before other tasks, in a hierarchical manner. See Additional file 2 for an executable version of this file
Fig. 3
Fig. 3
Example of a Jupyter notebook. This example contains code (in the Python programming language) for generating random numbers and plotting them in a graph within a Jupyter notebook. Importantly, the code and output object (graph) are contained within the same document. See Additional file 3 for an executable version of the notebook
Fig. 4
Fig. 4
Example of a document created using knitr. This example contains code (in the R language) for generating random numbers and plotting them on a graph. The knitr tool was used to generate the document, which combines the code and the output object (figure). See Additional file 4 for an executable version of this document
Fig. 5
Fig. 5
Architecture of virtual machines. Virtual machines encapsulate analytical software and dependencies within a “guest” operating system, which may be different to the main (“host”) operating system. A virtual machine executes in the context of virtualization software, which runs alongside other software installed on the computer
Fig. 6
Fig. 6
Architecture of software containers. Software containers encapsulate analytical software and dependencies. In contrast to virtual machines, containers execute within the context of the computer’s main operating system
Fig. 7
Fig. 7
Example of a Docker container for genomics research. This container would enable researchers to preprocess various types of molecular data, using tools from Bioconductor and Galaxy, and to analyze the resulting data within a Jupyter notebook. Each box within the container represents a distinct Docker image. These images are layered such that some images depend on others (for example, the Bioconductor image depends on R). At its base, the container includes operating system libraries, which may not be present (or may be configured differently) on the computer’s main operating system

References

    1. Fisher RA. The Design of Experiments. New York: Hafner Press; 1935.
    1. Popper KR. The logic of scientific discovery. London: Routledge; 1959.
    1. Peng RD. Reproducible research in computational science. Science. 2011;334:1226–7. doi: 10.1126/science.1213847. - DOI - PMC - PubMed
    1. Russell JF. If a job is worth doing, it is worth doing twice. Nature. 2013;496:7. doi: 10.1038/496007a. - DOI - PubMed
    1. Feynman RP. Six Easy Pieces: Essentials of Physics Explained by Its Most Brilliant Teacher. Boston, MA: Addison-Wesley; 1995. p. 34–5.

Publication types