Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 6:7:75.
doi: 10.3389/fgene.2016.00075. eCollection 2016.

Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions

Affiliations

Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions

Valerio Bianchi et al. Front Genet. .

Abstract

Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.

Keywords: epigenomics; genomics; high-throughput sequencing; laboratory information management system; workflow management system.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Workflow management systems for NGS: overview and issues discussed in the text. A typical analysis workflow for NGS is presented, associated to both the corresponding metadata and to optional additional external data. The workflow is linked to the corresponding issues discussed in the text.

References

    1. Afgan E., Baker D., Coraor N., Chapman B., Nekrutenko A., Taylor J. (2010). Galaxy CloudMan: delivering cloud compute clusters. BMC Bioinformatics 11(Suppl. 12):S4 10.1186/1471-2105-11-S12-S4 - DOI - PMC - PubMed
    1. Afgan E., Sloggett C., Goonasekera N., Makunin I., Benson D., Crowe M., et al. (2015). Genomics Virtual Laboratory: a practical bioinformatics workbench for the cloud. PLoS ONE 10:e0140829 10.1371/journal.pone.0140829 - DOI - PMC - PubMed
    1. Blankenberg D., Taylor J., Schenck I., He J., Zhang Y., Ghent M., et al. (2007). A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res. 17 960–964. 10.1101/gr.5578007 - DOI - PMC - PubMed
    1. Boekel J., Chilton J. M., Cooke I. R., Horvatovich P. L., Jagtap P. D., Käll L., et al. (2015). Multi-omic data analysis using Galaxy. Nat. Biotechnol. 33 137–139. 10.1038/nbt.3134 - DOI - PubMed
    1. Brazma A., Hingamp P., Quackenbush J., Sherlock G., Spellman P., Stoeckert C., et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29 365–371. 10.1038/ng1201-365 - DOI - PubMed