PEAR: a fast and accurate Illumina Paired-End reAd mergeR
- PMID: 24142950
- PMCID: PMC3933873
- DOI: 10.1093/bioinformatics/btt593
PEAR: a fast and accurate Illumina Paired-End reAd mergeR
Abstract
Motivation: The Illumina paired-end sequencing technology can generate reads from both ends of target DNA fragments, which can subsequently be merged to increase the overall read length. There already exist tools for merging these paired-end reads when the target fragments are equally long. However, when fragment lengths vary and, in particular, when either the fragment size is shorter than a single-end read, or longer than twice the size of a single-end read, most state-of-the-art mergers fail to generate reliable results. Therefore, a robust tool is needed to merge paired-end reads that exhibit varying overlap lengths because of varying target fragment lengths.
Results: We present the PEAR software for merging raw Illumina paired-end reads from target fragments of varying length. The program evaluates all possible paired-end read overlaps and does not require the target fragment size as input. It also implements a statistical test for minimizing false-positive results. Tests on simulated and empirical data show that PEAR consistently generates highly accurate merged paired-end reads. A highly optimized implementation allows for merging millions of paired-end reads within a few minutes on a standard desktop computer. On multi-core architectures, the parallel version of PEAR shows linear speedups compared with the sequential version of PEAR.
Availability and implementation: PEAR is implemented in C and uses POSIX threads. It is freely available at http://www.exelixis-lab.org/web/software/pear.
Figures


Similar articles
-
FLASH: fast length adjustment of short reads to improve genome assemblies.Bioinformatics. 2011 Nov 1;27(21):2957-63. doi: 10.1093/bioinformatics/btr507. Epub 2011 Sep 7. Bioinformatics. 2011. PMID: 21903629 Free PMC article.
-
NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors.BMC Bioinformatics. 2018 Dec 20;19(1):536. doi: 10.1186/s12859-018-2579-2. BMC Bioinformatics. 2018. PMID: 30572828 Free PMC article.
-
Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes.BMC Bioinformatics. 2020 Feb 24;21(1):74. doi: 10.1186/s12859-020-3416-y. BMC Bioinformatics. 2020. PMID: 32093654 Free PMC article.
-
Long fragments achieve lower base quality in Illumina paired-end sequencing.Sci Rep. 2019 Feb 27;9(1):2856. doi: 10.1038/s41598-019-39076-7. Sci Rep. 2019. PMID: 30814542 Free PMC article.
-
A survey of mapping algorithms in the long-reads era.Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3. Genome Biol. 2023. PMID: 37264447 Free PMC article. Review.
Cited by
-
The Cultivation of Halophilic Microalgae Shapes the Structure of Their Prokaryotic Assemblages.Microorganisms. 2024 Sep 26;12(10):1947. doi: 10.3390/microorganisms12101947. Microorganisms. 2024. PMID: 39458257 Free PMC article.
-
High-throughput functional variant screens via in vivo production of single-stranded DNA.Proc Natl Acad Sci U S A. 2021 May 4;118(18):e2018181118. doi: 10.1073/pnas.2018181118. Proc Natl Acad Sci U S A. 2021. PMID: 33906944 Free PMC article.
-
A systematic evaluation of the design and context dependencies of massively parallel reporter assays.Nat Methods. 2020 Nov;17(11):1083-1091. doi: 10.1038/s41592-020-0965-y. Epub 2020 Oct 12. Nat Methods. 2020. PMID: 33046894 Free PMC article.
-
Selection constrains lottery assembly in the microbiomes of closely related diatom species.ISME Commun. 2022 Feb 1;2(1):11. doi: 10.1038/s43705-022-00091-x. ISME Commun. 2022. PMID: 37938731 Free PMC article.
-
A PRC2-independent function for EZH2 in regulating rRNA 2'-O methylation and IRES-dependent translation.Nat Cell Biol. 2021 Apr;23(4):341-354. doi: 10.1038/s41556-021-00653-6. Epub 2021 Apr 1. Nat Cell Biol. 2021. PMID: 33795875 Free PMC article.
References
-
- Altschul S, Gish W. Local alignment statistics. Methods Enzymol. 1996;266:460–480. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous