Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 1;8(5):giz054.
doi: 10.1093/gigascience/giz054.

Software engineering for scientific big data analysis

Affiliations

Software engineering for scientific big data analysis

Björn A Grüning et al. Gigascience. .

Abstract

The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices.

Keywords: big data; coding; computational tools; data analysis; integration systems; scientific software; software development; software engineering; standards; workflow.

PubMed Disclaimer

References

    1. Piccolo SR, Frampton MB. Tools and techniques for computational reproducibility. Gigascience. 2016;5:30. - PMC - PubMed
    1. Sandve GK, Nekrutenko A, Taylor J, et al. .. Ten simple rules for reproducible computational research. PLoS Comput Biol. 2013;9:e1003285. - PMC - PubMed
    1. Nekrutenko A, Galaxy Team, Goecks J, Taylor J et al. .. Biology needs evolutionary software tools: Let's build them right. Mol Biol Evol. 2018;35(6):1372–5. - PMC - PubMed
    1. Jin X, Khatwani C, Niu N, et al. .. Pragmatic software reuse in bioinformatics: How can social network information help?In: Kapitsaki G, Santana de Almeida E, eds. Software Reuse: Bridging with Social-Awareness. ICSR 2016. Springer;2016:247–64.
    1. Perez-Riverol Y, Gatto L, Wang R, et al. .. Ten simple rules for taking advantage of git and GitHub. PLoS Comput Biol. 2016;12:e1004947. - PMC - PubMed

Publication types