Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Dec 14;1(3):895-905.
doi: 10.3390/biology1030895.

FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms

Affiliations

FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms

Matthias Dodt et al. Biology (Basel). .

Abstract

Quantitative and systems biology approaches benefit from the unprecedented depth of next-generation sequencing. A typical experiment yields millions of short reads, which oftentimes carry particular sequence tags. These tags may be: (a) specific to the sequencing platform and library construction method (e.g., adapter sequences); (b) have been introduced by experimental design (e.g., sample barcodes); or (c) constitute some biological signal (e.g., splice leader sequences in nematodes). Our software FLEXBAR enables accurate recognition, sorting and trimming of sequence tags with maximal flexibility, based on exact overlap sequence alignment. The software supports data formats from all current sequencing platforms, including color-space reads. FLEXBAR maintains read pairings and processes separate barcode reads on demand. Our software facilitates the fine-grained adjustment of sequence tag detection parameters and search regions. FLEXBAR is a multi-threaded software and combines speed with precision. Even complex read processing scenarios might be executed with a single command line call. We demonstrate the utility of the software in terms of read mapping applications, library demultiplexing and splice leader detection. FLEXBAR and additional information is available for academic use from the website: http://sourceforge.net/projects/flexbar/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Examples of overlap alignments in different sequence encodings. Sequence space and trim mode are denoted under the respective alignments. Red subsequences will be removed from the read sequence. (B) Graphical representation of sequence trimming modes. The gray bar depicts the currently processed sequencing read (length n). The best alignment of an adapter sequence (length m; shown in red) might be located anywhere in the demarcated region (arrow + adapter region), which differs according to the selected trim mode (see main text). The name of the trim mode refers to the part of the short read that is removed: in left modes the 5' end is trimmed, right refers to the 3' end, otherwise the shorter end is removed.
Figure 2
Figure 2
(A) Number of returned reads as stratified by subsequent read mapping with Bowtie. Mapping results of untreated reads (Bowtie only) are shown in the bottom most row (control case). The respective adapter removal tools did not return all reads, as some did not pass the respective output filters. (B) Number of bases that are contained in all uniquely mappable reads (blue part in A).
Figure 3
Figure 3
Compute time and memory requirements for FLEXBAR and competitors in benchmark 1. FLEXBAR’s performance is listed for 1, 2 and 4 threads. The evaluation was conducted on a dedicated machine with 2 AMD Opteron 2356 processors, each having 4 cores at 2.3 GHz.
Figure 4
Figure 4
Total number of identified splice leader sequences (light gray bars) and number of estimated false discoveries (black bars) for SL1 and SL2 in data set SRR353594. The following parameters were varied: --barcode-min-overlap {10,15,20}, --barcode-threshold {0,1,2} and --barcode-gap-cost was set to -100. All reads were either assigned to SL1 or SL2 if they passed the alignment criteria (preference is given to SL1 in case of equally scoring alignments). The ratio of all discoveries versus false discoveries is highest for the {20,0} parameter set.

References

    1. Döring A., Weese D., Rausch T., Reinert K. SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics. 2008;9:11. doi: 10.1186/1471-2105-9-11. - DOI - PMC - PubMed
    1. TBB Library. [(accessed on 14 August 2012)]. Available online: http://www.threadingbuildingblocks.org/
    1. Needleman S.B., Wunsch C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. - DOI - PubMed
    1. FASTX Toolkit. [(accessed on 25 July 2012)]. Available online: http://hannonlab.cshl.edu/fastx_toolkit/
    1. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 2011;17:10–12.

LinkOut - more resources