Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler
- PMID: 20027311
- PMCID: PMC2793427
- DOI: 10.1371/journal.pone.0008407
Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler
Abstract
Background: Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies.
Principal findings: We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly.
Conclusions: These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler.
Conflict of interest statement
Figures





Similar articles
-
Velvet: algorithms for de novo short read assembly using de Bruijn graphs.Genome Res. 2008 May;18(5):821-9. doi: 10.1101/gr.074492.107. Epub 2008 Mar 18. Genome Res. 2008. PMID: 18349386 Free PMC article.
-
SeqEntropy: genome-wide assessment of repeats for short read sequencing.PLoS One. 2013;8(3):e59484. doi: 10.1371/journal.pone.0059484. Epub 2013 Mar 27. PLoS One. 2013. PMID: 23544073 Free PMC article.
-
SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing.Genome Res. 2007 Nov;17(11):1697-706. doi: 10.1101/gr.6435207. Epub 2007 Oct 1. Genome Res. 2007. PMID: 17908823 Free PMC article.
-
PacBio Sequencing and Its Applications.Genomics Proteomics Bioinformatics. 2015 Oct;13(5):278-89. doi: 10.1016/j.gpb.2015.08.002. Epub 2015 Nov 2. Genomics Proteomics Bioinformatics. 2015. PMID: 26542840 Free PMC article. Review.
-
Deciphering Neurodegenerative Diseases Using Long-Read Sequencing.Neurology. 2021 Aug 31;97(9):423-433. doi: 10.1212/WNL.0000000000012466. Epub 2021 Aug 13. Neurology. 2021. PMID: 34389649 Free PMC article. Review.
Cited by
-
Whole genome sequencing of peach (Prunus persica L.) for SNP identification and selection.BMC Genomics. 2011 Nov 22;12:569. doi: 10.1186/1471-2164-12-569. BMC Genomics. 2011. PMID: 22108025 Free PMC article.
-
Isolation and characterization of 22 EST-SSR markers for the genus Thujopsis (Cupressaceae).Appl Plant Sci. 2015 Jan 30;3(2):apps.1400101. doi: 10.3732/apps.1400101. eCollection 2015 Feb. Appl Plant Sci. 2015. PMID: 25699219 Free PMC article.
-
Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection.Genome Biol. 2018 Nov 6;19(1):188. doi: 10.1186/s13059-018-1539-5. Genome Biol. 2018. PMID: 30400818 Free PMC article.
-
A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.PLoS One. 2011 Mar 14;6(3):e17915. doi: 10.1371/journal.pone.0017915. PLoS One. 2011. PMID: 21423806 Free PMC article.
-
Genome-wide analysis of HPV integration in human cancers reveals recurrent, focal genomic instability.Genome Res. 2014 Feb;24(2):185-99. doi: 10.1101/gr.164806.113. Epub 2013 Nov 7. Genome Res. 2014. PMID: 24201445 Free PMC article.
References
-
- Hillier LW, Marth GT, Quinlan AR, Dooling D, Fewell G, et al. Whole-genome sequencing and variant discovery in C. elegans. Nature meth. 2008;5:183–188. - PubMed
-
- Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316:1497–1502. - PubMed
-
- Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008;452:872–877. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous