Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;12(3):R31.
doi: 10.1186/gb-2011-12-3-r31. Epub 2011 Mar 31.

A vertebrate case study of the quality of assemblies derived from next-generation sequences

Affiliations

A vertebrate case study of the quality of assemblies derived from next-generation sequences

Liang Ye et al. Genome Biol. 2011.

Abstract

The unparalleled efficiency of next-generation sequencing (NGS) has prompted widespread adoption, but significant problems remain in the use of NGS data for whole genome assembly. We explore the advantages and disadvantages of chicken genome assemblies generated using a variety of sequencing and assembly methodologies. NGS assemblies are equivalent in some ways to a Sanger-based assembly yet deficient in others. Nonetheless, these assemblies are sufficient for the identification of the majority of genes and can reveal novel sequences when compared to existing assembly references.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sequencing cost of NGS assemblies compared to the reference assembly. The coverage of raw bases for the reference assembly is 6.6-fold, for the 454/Newbler assembly 14-fold, and for the Illumina/SOAP assembly 74-fold.
Figure 2
Figure 2
Novel sequence in NGS assemblies compared to the reference assembly. Each assembly was aligned to the Gallus_gallus-2.1 reference using BLAT and unaligned sequence was retained. After contamination removal, the 454/Newbler and Illumina/SOAP assemblies contain 18.9 Mbp and 24.5 Mbp of novel sequence, respectively. The NGS assemblies shared 12.2 Mbp of the non-reference sequence. (a) 454/Newbler (red); (b) Illumina/SOAP (green).
Figure 3
Figure 3
Gene fragmentation in NGS assemblies. Gene ENSGALG00000006569 locates from 16,937,180 to 17,042,224 on chromosome 13 in the Gallus-gallus-2.1 reference. The gene is broken into six scaffolds in the Illumina/SOAP assembly, and four scaffolds in the 454/Newbler assembly. Green bars represent scaffolds in the Illumina/SOAP assembly, and red bars represent scaffolds in the 454/Newbler assembly. Solid colored bars within scaffolds represent aligned regions while open bars denote gaps. The percentage GC (%GC) plot shows the relative GC content along the genome sequence. The horizontal red line indicates 50% GC content.

References

    1. Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2009;20:265–272. doi: 10.1101/gr.097261.109. - DOI - PMC - PubMed
    1. Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y, Zhang Z, Zhang Y, Wang W, Li J, Wei F, Li H, Jian M, Li J, Zhang Z, Nielsen R, Li D, Gu W, Yang Z, Xuan Z, Ryder OA, Leung FC, Zhou Y, Cao J, Sun X, Fu Y. et al.The sequence and de novo assembly of the giant panda genome. Nature. 2010;463:311–317. doi: 10.1038/nature08696. - DOI - PMC - PubMed
    1. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM, Aird D, Costello M, Daza R, Williams L, Nicol R, Gnirke A, Nusbaum C, Lander ES, Jaffe DB. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA. 2011;108:1513–1518. doi: 10.1073/pnas.1017351108. - DOI - PMC - PubMed
    1. Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387–402. doi: 10.1146/annurev.genom.9.081307.164359. - DOI - PubMed
    1. 454. http://www.454.com

Substances

LinkOut - more resources