Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 30;40(27):6057-6068.
doi: 10.1002/sim.9169. Epub 2021 Sep 6.

Best practices in statistical computing

Affiliations

Best practices in statistical computing

Ricardo Sanchez et al. Stat Med. .

Abstract

The world is becoming increasingly complex, both in terms of the rich sources of data we have access to and the statistical and computational methods we can use on data. These factors create an ever-increasing risk for errors in code and the sensitivity of findings to data preparation and the execution of complex statistical and computing methods. The consequences of coding and data mistakes can be substantial. In this paper, we describe the key steps for implementing a code quality assurance (QA) process that researchers can follow to improve their coding practices throughout a project to assure the quality of the final data, code, analyses, and results. These steps include: (i) adherence to principles for code writing and style that follow best practices; (ii) clear written documentation that describes code, workflow, and key analytic decisions; (iii) careful version control; (iv) good data management; and (v) regular testing and review. Following these steps will greatly improve the ability of a study to assure results are accurate and reproducible. The responsibility for code QA falls not only on individual researchers but institutions, journals, and funding agencies as well.

Keywords: data management; methodology; version control.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Key strategies for code quality assurance (QA) [Colour figure can be viewed at wileyonlinelibrary.com]
FIGURE 2
FIGURE 2
Sample strategies for writing clean code using case study data. Code snippet can be found here: https://github.com/jpane24/code-qa/blob/main/code/R00-Data-Cleaning.R [Colour figure can be viewed at wileyonlinelibrary.com]
FIGURE 3
FIGURE 3
Example README file for the code-qa github. README example from: https://github.com/jpane24/code-qa [Colour figure can be viewed at wileyonlinelibrary.com]
FIGURE 4
FIGURE 4
Example of commented code for a function. Code snippet can be found here: https://github.com/jpane24/code-qa/blob/main/code/helper.R [Colour figure can be viewed at wileyonlinelibrary.com]
FIGURE 5
FIGURE 5
Example of documenting a key analytic decision for handling missing data. Code snippet can be found here: https://github.com/jpane24/code-qa/blob/main/code/R00-Data-Cleaning.R [Colour figure can be viewed at wileyonlinelibrary.com]
FIGURE 6
FIGURE 6
Example of commits and merge [Colour figure can be viewed at wileyonlinelibrary.com]

References

    1. Casadevall A, Steen RG, Fang FC. Sources of error in the retracted scientific literature. FASEB J. 2014;28(9):3847–3855. 10.1096/fj.14-256735 - DOI - PMC - PubMed
    1. Eklund A, Nichols TE, Knutsson H. Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. Proc Natl Acad Sci. 2016;113(28):7900–7905. 10.1073/pnas.1602413113 - DOI - PMC - PubMed
    1. Ziemann M, Eren Y, El-Osta A. Gene name errors are widespread in the scientific literature. Genome Biol. 2016;17(1):177. 10.1186/s13059-016-1044-7 - DOI - PMC - PubMed
    1. Botvinik-Nezer R, Holzmeister F, Camerer CF, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature. 2020;582(7810):84–88. 10.1038/s41586-020-2314-9 - DOI - PMC - PubMed
    1. Enserink M How to avoid the stigma of a retracted paper? don’t call it a retraction. Science Magazine, American Association for the Advancement of Science; 2017. 10.1126/science.aan6937 - DOI

Publication types

LinkOut - more resources