Assembly complexity of prokaryotic genomes using short reads
- PMID: 20064276
- PMCID: PMC2821320
- DOI: 10.1186/1471-2105-11-21
Assembly complexity of prokaryotic genomes using short reads
Abstract
Background: De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes.
Results: We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for de novo reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages).
Conclusions: Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.
Figures




Similar articles
-
Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods.BMC Genomics. 2013 Oct 11;14:700. doi: 10.1186/1471-2164-14-700. BMC Genomics. 2013. PMID: 24118975 Free PMC article.
-
Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies.BMC Bioinformatics. 2011 Apr 13;12:95. doi: 10.1186/1471-2105-12-95. BMC Bioinformatics. 2011. PMID: 21486487 Free PMC article.
-
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8. BMC Genomics. 2016. PMID: 27556636 Free PMC article.
-
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
-
Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.Brief Bioinform. 2020 May 21;21(3):777-790. doi: 10.1093/bib/bbz025. Brief Bioinform. 2020. PMID: 30860572 Free PMC article. Review.
Cited by
-
Computational solutions for omics data.Nat Rev Genet. 2013 May;14(5):333-46. doi: 10.1038/nrg3433. Nat Rev Genet. 2013. PMID: 23594911 Free PMC article. Review.
-
Semi-automatic in silico gap closure enabled de novo assembly of two Dehalobacter genomes from metagenomic data.PLoS One. 2012;7(12):e52038. doi: 10.1371/journal.pone.0052038. Epub 2012 Dec 21. PLoS One. 2012. PMID: 23284863 Free PMC article.
-
Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies.PLoS One. 2010 Jul 12;5(7):e11518. doi: 10.1371/journal.pone.0011518. PLoS One. 2010. PMID: 20634954 Free PMC article.
-
A safety framework for flow decomposition problems via integer linear programming.Bioinformatics. 2023 Nov 1;39(11):btad640. doi: 10.1093/bioinformatics/btad640. Bioinformatics. 2023. PMID: 37862229 Free PMC article.
-
Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome.PLoS One. 2015 Apr 8;10(4):e0122979. doi: 10.1371/journal.pone.0122979. eCollection 2015. PLoS One. 2015. PMID: 25853708 Free PMC article.
References
-
- Solexa. http://www.solexa.com/
-
- Applied Biosystems. http://www.appliedbiosystems.com
-
- Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M, Colonell J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M, Lapen D, Moulton K, Quake SR, Steinmann K, Thayer E, Tyurina A, Ward R, Weiss H, Xie Z. Single-molecule DNA sequencing of a viral genome. Science. 2008;320(5872):106–109. doi: 10.1126/science.1150427. - DOI - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources