Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 9:16:374.
doi: 10.1186/s12859-015-0795-6.

TOGGLE: toolbox for generic NGS analyses

Affiliations

TOGGLE: toolbox for generic NGS analyses

Cécile Monat et al. BMC Bioinformatics. .

Abstract

Background: The explosion of NGS (Next Generation Sequencing) sequence data requires a huge effort in Bioinformatics methods and analyses. The creation of dedicated, robust and reliable pipelines able to handle dozens of samples from raw FASTQ data to relevant biological data is a time-consuming task in all projects relying on NGS. To address this, we created a generic and modular toolbox for developing such pipelines.

Results: TOGGLE (TOolbox for Generic nGs anaLysEs) is a suite of tools able to design pipelines that manage large sets of NGS softwares and utilities. Moreover, TOGGLE offers an easy way to manipulate the various options of the different softwares through the pipelines in using a single basic configuration file, which can be changed for each assay without having to change the code itself. We also describe one implementation of TOGGLE in a complete analysis pipeline designed for SNP discovery for large sets of genomic data, ready to use in different environments (from a single machine to HPC clusters).

Conclusion: TOGGLE speeds up the creation of robust pipelines with reliable log tracking and data flow, for a large range of analyses. Moreover, it enables Biologists to concentrate on the biological relevance of results, and change the experimental conditions easily. The whole code and test data are available at https://github.com/SouthGreenPlatform/TOGGLE .

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Software execution presentation. (1) Input data are submitted to a given Module. (2) The Module will construct a command line to toolbox::run. (3) The text of the command line and the run report are sent to toolbox::exportLog. (4) The output of the command (ok, error or warning) is sent back to the original Module. (5) The Module will send a report to toolbox::exportLog. (6) The output data are delivered from the Module
Fig. 2
Fig. 2
Softwares configuration file. The lines starting with “$” correspond to the name of the current module called (e.g. bwa aln). The lines just after list the option(s) associated with this call; the list of options is finished with an empty line. Lines starting with “ ” are reserved for comments
Fig. 3
Fig. 3
Directories tree structure. Representation of the tree of directories with the log files during the execution of globalAnalysis.pl pipeline, with the example of three individuals representing each possibility. The first one is a paired-end data, the second one is a single-end data, and the last one is a paired-end data which generate single reads during the cleaning step
Fig. 4
Fig. 4
Pipeline DNAseq presentation. Basic overview of the pairAnalysis.pl, singleAnalysis.pl, mergeAnalysis.pl pipelines, and of the wrapping globalAnalysis.pl pipeline. Each colored box represents a given module, and each color a specific package. See text for the corresponding steps. A more complete figure is available on the TOGGLE website

References

    1. Bao S, Jiang R, Kwan W, Wang B, Ma X, Song Y. Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet. 2011;56:406–14. doi: 10.1038/jhg.2011.43. - DOI - PubMed
    1. Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, et al. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2014;15(2):256–78. doi: 10.1093/bib/bbs086. - DOI - PMC - PubMed
    1. Kelly BJ, Fitch JR, Hu Y, Corsmeier DJ, Zhong H, Wetzel AN, et al. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics. Genome Biol. 2015;16(1):6. doi: 10.1186/s13059-014-0577-x. - DOI - PMC - PubMed
    1. Lam HYK, Pan C, Clark MJ, Lacroute P, Chen R, Haraksingh R, et al. Detecting and annotating genetic variations using the HugeSeq pipeline. Nat Biotechnol. 2012;30(3):226–9. doi: 10.1038/nbt.2134. - DOI - PMC - PubMed
    1. Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25:1754–60. doi: 10.1093/bioinformatics/btp324. - DOI - PMC - PubMed

Publication types

LinkOut - more resources