Limitations of next-generation genome sequence assembly
- PMID: 21102452
- PMCID: PMC3115693
- DOI: 10.1038/nmeth.1527
Limitations of next-generation genome sequence assembly
Abstract
High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de novo assemblies using the short oligonucleotide analysis package (SOAP), generated from the genomes of a Han Chinese individual and a Yoruban individual, to experimentally validated genomic features. We found that de novo assemblies were 16.2% shorter than the reference genome and that 420.2 megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Consequently, over 2,377 coding exons were completely missing. We conclude that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution.
Figures

Comment in
-
Assemblies: the good, the bad, the ugly.Nat Methods. 2011 Jan;8(1):59-60. doi: 10.1038/nmeth0111-59. Nat Methods. 2011. PMID: 21191376 No abstract available.
Similar articles
-
The complete genome of an individual by massively parallel DNA sequencing.Nature. 2008 Apr 17;452(7189):872-6. doi: 10.1038/nature06884. Nature. 2008. PMID: 18421352
-
Personal genomes: Standard and pores.Nature. 2008 Nov 6;456(7218):23-5. doi: 10.1038/456023a. Nature. 2008. PMID: 18987710 No abstract available.
-
Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads.Hum Genet. 2016 Jul;135(7):727-40. doi: 10.1007/s00439-016-1667-5. Epub 2016 Apr 9. Hum Genet. 2016. PMID: 27061184 Free PMC article.
-
Genetic variation and the de novo assembly of human genomes.Nat Rev Genet. 2015 Nov;16(11):627-40. doi: 10.1038/nrg3933. Epub 2015 Oct 7. Nat Rev Genet. 2015. PMID: 26442640 Free PMC article. Review.
-
Genomic Analysis in the Age of Human Genome Sequencing.Cell. 2019 Mar 21;177(1):70-84. doi: 10.1016/j.cell.2019.02.032. Cell. 2019. PMID: 30901550 Free PMC article. Review.
Cited by
-
Analysis of Litopenaeus vannamei transcriptome using the next-generation DNA sequencing technique.PLoS One. 2012;7(10):e47442. doi: 10.1371/journal.pone.0047442. Epub 2012 Oct 11. PLoS One. 2012. PMID: 23071809 Free PMC article.
-
A high-resolution cucumber cytogenetic map integrated with the genome assembly.BMC Genomics. 2013 Jul 9;14:461. doi: 10.1186/1471-2164-14-461. BMC Genomics. 2013. PMID: 23834562 Free PMC article.
-
Introgressing the Aegilops tauschii genome into wheat as a basis for cereal improvement.Nat Plants. 2021 Jun;7(6):774-786. doi: 10.1038/s41477-021-00934-w. Epub 2021 May 27. Nat Plants. 2021. PMID: 34045708
-
Test driving genome assemblers.Nat Biotechnol. 2012 Apr 10;30(4):330-1. doi: 10.1038/nbt.2172. Nat Biotechnol. 2012. PMID: 22491283 No abstract available.
-
Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise.Mol Ecol Resour. 2021 Jan;21(1):263-286. doi: 10.1111/1755-0998.13252. Epub 2020 Oct 10. Mol Ecol Resour. 2021. PMID: 32937018 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources