ALLPATHS: de novo assembly of whole-genome shotgun microreads
- PMID: 18340039
- PMCID: PMC2336810
- DOI: 10.1101/gr.7337908
ALLPATHS: de novo assembly of whole-genome shotgun microreads
Abstract
New DNA sequencing technologies deliver data at dramatically lower costs but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun "microreads." For 11 genomes of sizes up to 39 Mb, we generated high-quality assemblies from 80x coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of Campylobacter jejuni and Escherichia coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.
Figures







Similar articles
-
Velvet: algorithms for de novo short read assembly using de Bruijn graphs.Genome Res. 2008 May;18(5):821-9. doi: 10.1101/gr.074492.107. Epub 2008 Mar 18. Genome Res. 2008. PMID: 18349386 Free PMC article.
-
Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun. PLoS Comput Biol. 2017. PMID: 28594827 Free PMC article.
-
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8. BMC Genomics. 2016. PMID: 27556636 Free PMC article.
-
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
-
Assembly algorithms for next-generation sequencing data.Genomics. 2010 Jun;95(6):315-27. doi: 10.1016/j.ygeno.2010.03.001. Epub 2010 Mar 6. Genomics. 2010. PMID: 20211242 Free PMC article. Review.
Cited by
-
Next-generation sequencing approach for connecting secondary metabolites to biosynthetic gene clusters in fungi.Front Microbiol. 2015 Jan 14;5:774. doi: 10.3389/fmicb.2014.00774. eCollection 2014. Front Microbiol. 2015. PMID: 25642215 Free PMC article. Review.
-
A hybrid approach for the automated finishing of bacterial genomes.Nat Biotechnol. 2012 Jul 1;30(7):701-707. doi: 10.1038/nbt.2288. Nat Biotechnol. 2012. PMID: 22750883 Free PMC article.
-
The genome of Diuraphis noxia, a global aphid pest of small grains.BMC Genomics. 2015 Jun 5;16:429. doi: 10.1186/s12864-015-1525-1. BMC Genomics. 2015. PMID: 26044338 Free PMC article.
-
Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics.Sci Rep. 2015 Jul 9;5:10940. doi: 10.1038/srep10940. Sci Rep. 2015. PMID: 26156868 Free PMC article.
-
Finished Genome Sequence of Bacillus cereus Strain 03BB87, a Clinical Isolate with B. anthracis Virulence Genes.Genome Announc. 2015 Jan 15;3(1):e01446-14. doi: 10.1128/genomeA.01446-14. Genome Announc. 2015. PMID: 25593267 Free PMC article.
References
-
- Batzoglou S., Jaffe D.B., Stanley K., Butler J., Gnerre S., Mauceli E., Berger B., Mesirov J.P., Lander E.S., Jaffe D.B., Stanley K., Butler J., Gnerre S., Mauceli E., Berger B., Mesirov J.P., Lander E.S., Stanley K., Butler J., Gnerre S., Mauceli E., Berger B., Mesirov J.P., Lander E.S., Butler J., Gnerre S., Mauceli E., Berger B., Mesirov J.P., Lander E.S., Gnerre S., Mauceli E., Berger B., Mesirov J.P., Lander E.S., Mauceli E., Berger B., Mesirov J.P., Lander E.S., Berger B., Mesirov J.P., Lander E.S., Mesirov J.P., Lander E.S., Lander E.S. ARACHNE: A whole-genome shotgun assembler. Genome Res. 2002;12:177–189. - PMC - PubMed
-
- Jeck W.R., Reinhardt J.A., Baltrus D.A., Hickenbotham M.T., Magrini V., Mardis E.R., Dangl J.L., Jones C.D., Reinhardt J.A., Baltrus D.A., Hickenbotham M.T., Magrini V., Mardis E.R., Dangl J.L., Jones C.D., Baltrus D.A., Hickenbotham M.T., Magrini V., Mardis E.R., Dangl J.L., Jones C.D., Hickenbotham M.T., Magrini V., Mardis E.R., Dangl J.L., Jones C.D., Magrini V., Mardis E.R., Dangl J.L., Jones C.D., Mardis E.R., Dangl J.L., Jones C.D., Dangl J.L., Jones C.D., Jones C.D. Extending assembly of short DNA sequences to handle error. Bioinformatics. 2007;23:2942–2944. - PubMed
-
- Johnson D.S., Mortazavi A., Myers R.M., Wold B., Mortazavi A., Myers R.M., Wold B., Myers R.M., Wold B., Wold B. Genome-wide mapping of in vivo protein–DNA interactions. Science. 2007;316:1497–1502. - PubMed
-
- Low G. 2004. Graphviz. http://www.graphviz.org.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous