Reptile: representative tiling for short read error correction
- PMID: 20834037
- DOI: 10.1093/bioinformatics/btq468
Reptile: representative tiling for short read error correction
Abstract
Motivation: Error correction is critical to the success of next-generation sequencing applications, such as resequencing and de novo genome sequencing. It is especially important for high-throughput short-read sequencing, where reads are much shorter and more abundant, and errors more frequent than in traditional Sanger sequencing. Processing massive numbers of short reads with existing error correction methods is both compute and memory intensive, yet the results are far from satisfactory when applied to real datasets.
Results: We present a novel approach, termed Reptile, for error correction in short-read data from next-generation sequencing. Reptile works with the spectrum of k-mers from the input reads, and corrects errors by simultaneously examining: (i) Hamming distance-based correction possibilities for potentially erroneous k-mers; and (ii) neighboring k-mers from the same read for correct contextual information. By not needing to store input data, Reptile has the favorable property that it can handle data that does not fit in main memory. In addition to sequence data, Reptile can make use of available quality score information. Our experiments show that Reptile outperforms previous methods in the percentage of errors removed from the data and the accuracy in true base assignment. In addition, a significant reduction in run time and memory usage have been achieved compared with previous methods, making it more practical for short-read error correction when sampling larger genomes.
Availability: Reptile is implemented in C++ and is available through the link: http://aluru-sun.ece.iastate.edu/doku.php?id=software
Contact: aluru@iastate.edu.
Similar articles
-
De novo sequencing of plant genomes using second-generation technologies.Brief Bioinform. 2009 Nov;10(6):609-18. doi: 10.1093/bib/bbp039. Brief Bioinform. 2009. PMID: 19933209 Review.
-
Microindel detection in short-read sequence data.Bioinformatics. 2010 Mar 15;26(6):722-9. doi: 10.1093/bioinformatics/btq027. Epub 2010 Feb 9. Bioinformatics. 2010. PMID: 20144947
-
SHREC: a short-read error correction method.Bioinformatics. 2009 Sep 1;25(17):2157-63. doi: 10.1093/bioinformatics/btp379. Epub 2009 Jun 19. Bioinformatics. 2009. PMID: 19542152
-
Correction of sequencing errors in a mixed set of reads.Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8. Bioinformatics. 2010. PMID: 20378555
-
Sequencing and genome assembly using next-generation technologies.Methods Mol Biol. 2010;673:1-17. doi: 10.1007/978-1-60761-842-3_1. Methods Mol Biol. 2010. PMID: 20835789 Review.
Cited by
-
SparkEC: speeding up alignment-based DNA error correction tools.BMC Bioinformatics. 2022 Nov 7;23(1):464. doi: 10.1186/s12859-022-05013-1. BMC Bioinformatics. 2022. PMID: 36344928 Free PMC article.
-
Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.Brief Bioinform. 2016 Jan;17(1):154-79. doi: 10.1093/bib/bbv029. Epub 2015 May 29. Brief Bioinform. 2016. PMID: 26026159 Free PMC article.
-
Lighter: fast and memory-efficient sequencing error correction without counting.Genome Biol. 2014;15(11):509. doi: 10.1186/s13059-014-0509-9. Genome Biol. 2014. PMID: 25398208 Free PMC article.
-
Conserved expression of vertebrate microvillar gene homologs in choanocytes of freshwater sponges.Evodevo. 2016 Jul 12;7:13. doi: 10.1186/s13227-016-0050-x. eCollection 2016. Evodevo. 2016. PMID: 27413529 Free PMC article.
-
Gene family innovation, conservation and loss on the animal stem lineage.Elife. 2018 May 31;7:e34226. doi: 10.7554/eLife.34226. Elife. 2018. PMID: 29848444 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases