Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 1;30(5):614-20.
doi: 10.1093/bioinformatics/btt593. Epub 2013 Oct 18.

PEAR: a fast and accurate Illumina Paired-End reAd mergeR

Affiliations

PEAR: a fast and accurate Illumina Paired-End reAd mergeR

Jiajie Zhang et al. Bioinformatics. .

Abstract

Motivation: The Illumina paired-end sequencing technology can generate reads from both ends of target DNA fragments, which can subsequently be merged to increase the overall read length. There already exist tools for merging these paired-end reads when the target fragments are equally long. However, when fragment lengths vary and, in particular, when either the fragment size is shorter than a single-end read, or longer than twice the size of a single-end read, most state-of-the-art mergers fail to generate reliable results. Therefore, a robust tool is needed to merge paired-end reads that exhibit varying overlap lengths because of varying target fragment lengths.

Results: We present the PEAR software for merging raw Illumina paired-end reads from target fragments of varying length. The program evaluates all possible paired-end read overlaps and does not require the target fragment size as input. It also implements a statistical test for minimizing false-positive results. Tests on simulated and empirical data show that PEAR consistently generates highly accurate merged paired-end reads. A highly optimized implementation allows for merging millions of paired-end reads within a few minutes on a standard desktop computer. On multi-core architectures, the parallel version of PEAR shows linear speedups compared with the sequential version of PEAR.

Availability and implementation: PEAR is implemented in C and uses POSIX threads. It is freely available at http://www.exelixis-lab.org/web/software/pear.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Three possible scenarios for paired-end read lengths and target DNA fragment lengths. (A) Short overlap between the paired-end reads; (B) no overlap between the paired-end reads; (C) single-end read length is larger than the target DNA fragment length
Fig. 2.
Fig. 2.
Parallel speedups of PEAR, FLASH and PANDAseq on the single template sequences dataset. The sequential runtimes for the three mergers are 98, 58 and 39 s, respectively

Similar articles

Cited by

References

    1. Altschul S, Gish W. Local alignment statistics. Methods Enzymol. 1996;266:460–480. - PubMed
    1. Bartram AK, et al. Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end Illumina reads. Appl. Environ. Microbiol. 2011;77:3846–3852. - PMC - PubMed
    1. Caporaso JG, et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl Acad. Sci. USA. 2011;108:4516–4522. - PMC - PubMed
    1. Cock PJA, et al. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010;38:1767–1771. - PMC - PubMed
    1. Cox MP, et al. SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics. 2010;11:485. - PMC - PubMed

Publication types

MeSH terms