Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 15;30(10):1354-62.
doi: 10.1093/bioinformatics/btu030. Epub 2014 Jan 21.

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads

Affiliations

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads

Yun Heo et al. Bioinformatics. .

Abstract

Motivation: Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers.

Results: We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors.

Availability and implementation: Freely available at http://sourceforge.net/p/bless-ec

Contact: dchen@illinois.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The high level block diagram of BLESS. The cylinders and the rectangle with extra lines depict data written to disk and memory, respectively
Fig. 2.
Fig. 2.
The number of TPs and per-base sensitivity calculated in each position of the D1 reads. (A) The number of TPs calculated separately in each position of D1. ‘Reference’ shows the entire number of mismatch errors of the uncorrected reads. The other lines show the number of corrected errors made by each error correction tool. (B) The ratio of TP to Reference (i.e. number of errors in uncorrected reads) in each position of the D1 reads

References

    1. Beerenwinkel N, Zagordi O. Ultra-deep sequencing for the analysis of viral populations. Curr. Opin. Virol. 2011;1:413–418. - PubMed
    1. Bloom B. Space/time trade-offs in hash coding with allowable errors. Commun. ACM. 1970;13:422–426.
    1. Chaisson M, et al. De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 2009;19:336–346. - PMC - PubMed
    1. Chikhi R, Medvedev P. Informed and automated k-mer size selection for genome assembly. Bioinformatics. 2014;30:31–37. - PubMed
    1. Deorowicz S, et al. Disk-based k-mer counting on a PC. BMC Bioinformatics. 2013;14:160. - PMC - PubMed

Publication types