Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 15;29(14):1718-25.
doi: 10.1093/bioinformatics/btt273. Epub 2013 May 10.

GAGE-B: an evaluation of genome assemblers for bacterial organisms

Affiliations

GAGE-B: an evaluation of genome assemblers for bacterial organisms

Tanja Magoc et al. Bioinformatics. .

Abstract

Motivation: A large and rapidly growing number of bacterial organisms have been sequenced by the newest sequencing technologies. Cheaper and faster sequencing technologies make it easy to generate very high coverage of bacterial genomes, but these advances mean that DNA preparation costs can exceed the cost of sequencing for small genomes. The need to contain costs often results in the creation of only a single sequencing library, which in turn introduces new challenges for genome assembly methods.

Results: We evaluated the ability of multiple genome assembly programs to assemble bacterial genomes from a single, deep-coverage library. For our comparison, we chose bacterial species spanning a wide range of GC content and measured the contiguity and accuracy of the resulting assemblies. We compared the assemblies produced by this very high-coverage, one-library strategy to the best assemblies created by two-library sequencing, and we found that remarkably good bacterial assemblies are possible with just one library. We also measured the effect of read length and depth of coverage on assembly quality and determined the values that provide the best results with current algorithms.

Contact: salzberg@jhu.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Comparison of N50 contig size (in kilobases) on the y-axis, versus depth of coverage on the x-axis, for the eight assemblers used in this study. All datasets were 100 bp HiSeq reads from B.cereus

References

    1. Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Aronesty E. Ea-utils: command-line tools for processing biological sequencing data. 2011. http://code.google.com/p/ea-utils (3 June 2013, date last accessed)
    1. Bankevich A, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. - PMC - PubMed
    1. Barthelson R, et al. Plantagora: modeling whole genome sequencing and assembly of plant genomes. PLoS One. 2011;6:e28436. - PMC - PubMed
    1. Chevreux B, et al. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 2004;14:1147–1159. - PMC - PubMed

Publication types

LinkOut - more resources