GAGE: A critical evaluation of genome assemblies and assembly algorithms
- PMID: 22147368
- PMCID: PMC3290791
- DOI: 10.1101/gr.131383.111
GAGE: A critical evaluation of genome assemblies and assembly algorithms
Erratum in
- Genome Res. 2012 Jun;22(6):1196
Abstract
New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previously unsequenced organisms. The lowest-cost technology can generate deep coverage of most species, including mammals, in just a few days. The sequence data generated by one of these projects consist of millions or billions of short DNA sequences (reads) that range from 50 to 150 nt in length. These sequences must then be assembled de novo before most genome analyses can begin. Unfortunately, genome assembly remains a very difficult problem, made more difficult by shorter reads and unreliable long-range linking information. In this study, we evaluated several of the leading de novo assembly algorithms on four different short-read data sets, all generated by Illumina sequencers. Our results describe the relative performance of the different assemblers as well as other significant differences in assembly difficulty that appear to be inherent in the genomes themselves. Three overarching conclusions are apparent: first, that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome; second, that the degree of contiguity of an assembly varies enormously among different assemblers and different genomes; and third, that the correctness of an assembly also varies widely and is not well correlated with statistics on contiguity. To enable others to replicate our results, all of our data and methods are freely available, as are all assemblers used in this study.
Figures






Similar articles
-
GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers.PLoS One. 2014 Sep 8;9(9):e107014. doi: 10.1371/journal.pone.0107014. eCollection 2014. PLoS One. 2014. PMID: 25198770 Free PMC article.
-
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8. BMC Genomics. 2016. PMID: 27556636 Free PMC article.
-
High quality 3C de novo assembly and annotation of a multidrug resistant ST-111 Pseudomonas aeruginosa genome: Benchmark of hybrid and non-hybrid assemblers.Sci Rep. 2020 Jan 29;10(1):1392. doi: 10.1038/s41598-020-58319-6. Sci Rep. 2020. PMID: 31996747 Free PMC article.
-
Genome assembly in the telomere-to-telomere era.Nat Rev Genet. 2024 Sep;25(9):658-670. doi: 10.1038/s41576-024-00718-w. Epub 2024 Apr 22. Nat Rev Genet. 2024. PMID: 38649458 Review.
-
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
Cited by
-
dipSPAdes: Assembler for Highly Polymorphic Diploid Genomes.J Comput Biol. 2015 Jun;22(6):528-45. doi: 10.1089/cmb.2014.0153. Epub 2015 Mar 3. J Comput Biol. 2015. PMID: 25734602 Free PMC article.
-
Effects of GC bias in next-generation-sequencing data on de novo genome assembly.PLoS One. 2013 Apr 29;8(4):e62856. doi: 10.1371/journal.pone.0062856. Print 2013. PLoS One. 2013. PMID: 23638157 Free PMC article.
-
instaGRAAL: chromosome-level quality scaffolding of genomes using a proximity ligation-based scaffolder.Genome Biol. 2020 Jun 18;21(1):148. doi: 10.1186/s13059-020-02041-z. Genome Biol. 2020. PMID: 32552806 Free PMC article.
-
Jabba: hybrid error correction for long sequencing reads.Algorithms Mol Biol. 2016 May 3;11:10. doi: 10.1186/s13015-016-0075-7. eCollection 2016. Algorithms Mol Biol. 2016. PMID: 27148393 Free PMC article.
-
CISA: contig integrator for sequence assembly of bacterial genomes.PLoS One. 2013;8(3):e60843. doi: 10.1371/journal.pone.0060843. Epub 2013 Mar 28. PLoS One. 2013. PMID: 23556006 Free PMC article.
References
-
- Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Blomberg Le A, Bouffard P, Burt DW, Crasta O, Crooijmans RP, et al. 2010. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): Genome assembly and analysis. PLoS Biol 8: e1000475 doi: 10.1371/journal.pbio.1000475 - PMC - PubMed
-
- Ju YS, Kim JI, Kim S, Hong D, Park H, Shin JY, Lee S, Lee WC, Yu SB, Park SS, et al. 2011. Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals. Nat Genet 43: 745–752 - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials