The State of Software for Evolutionary Biology
- PMID: 29385525
- PMCID: PMC5913673
- DOI: 10.1093/molbev/msy014
The State of Software for Evolutionary Biology
Abstract
With Next Generation Sequencing data being routinely used, evolutionary biology is transforming into a computational science. Thus, researchers have to rely on a growing number of increasingly complex software. All widely used core tools in the field have grown considerably, in terms of the number of features as well as lines of code and consequently, also with respect to software complexity. A topic that has received little attention is the software engineering quality of widely used core analysis tools. Software developers appear to rarely assess the quality of their code, and this can have potential negative consequences for end-users. To this end, we assessed the code quality of 16 highly cited and compute-intensive tools mainly written in C/C++ (e.g., MrBayes, MAFFT, SweepFinder, etc.) and JAVA (BEAST) from the broader area of evolutionary biology that are being routinely used in current data analysis pipelines. Because, the software engineering quality of the tools we analyzed is rather unsatisfying, we provide a list of best practices for improving the quality of existing tools and list techniques that can be deployed for developing reliable, high quality scientific software from scratch. Finally, we also discuss journal as well as science policy and, more importantly, funding issues that need to be addressed for improving software engineering quality as well as ensuring support for developing new and maintaining existing software. Our intention is to raise the awareness of the community regarding software engineering quality issues and to emphasize the substantial lack of funding for scientific software development.
References
-
- Abdelmalek NN. 1971. Round off error analysis for Gram–Schmidt method and solution of linear least squares problems. BIT Numer. Math. 114:345–367.
-
- Biczok R, Bozsoky P, Eisenmann P, Ernst J, Ribizel T, Scholz F, Trefzer A, Weber F, Hamann M, Stamatakis A.. 2017. Two C++ libraries for counting trees on a phylogenetic terrace. bioRxiv. https://www.biorxiv.org/content/early/2017/11/02/211276. - PMC - PubMed
-
- Briand LC, Wüst J, Ikonomovski SV, Lounis H.. 1999. Investigating quality factors in object-oriented designs: an industrial case study. In: Proceedings of the 21st International Conference on Software Engineering, ACM. p. 345–354.
-
- Briand LC, Wüst J, Daly JW, Porter DV.. 2000. Exploring the relationships between design measures and software quality in object-oriented systems. J. Syst. Softw. 513:245–273.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources