Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 30:5:16780.
doi: 10.1038/srep16780.

The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome

Affiliations

The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome

Hiroaki Sakai et al. Sci Rep. .

Abstract

Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
NG graphs of the three assemblies in scaffold length (a) and contig length (b). The y-axis indicates the calculated NG contig/scaffold length (NG1 through NG100, see text for detail) in each assembled genome. The vertical line indicates the NG50 contig/scaffold length.
Figure 2
Figure 2. Summary of annotations.
(a) The amounts of unique sequences, repetitive sequences, gaps, and unassembled sequences in each assembly. (b) Examples of wrong annotations in Assembly_2. At the locus of Vigan.02G030200 (top) in Assembly_3, sequence from the 2nd to the 3rd intron was left as a gap in Assembly_2, leading to fragmentations of this locus. The 23 kb region of the locus Vigan.03G124500 (bottom) was assembled into only a 13 kb contig in Assembly_2, in which both ends of this region were totally unassembled, and a 2 kb region in the 9th intron was missing. In this case, two genes were also annotated, one of which was mostly comprised of intronic sequences. (c) Number of gene families with size differences. ++ and −− indicate gene families with differences of more than +4 and −4 in size, respectively. (d) Difference in total gene numbers in gene families with size differences.
Figure 3
Figure 3
NG graphs of legume genomes of (a) contigs and (b) pseudomolecules. The x-axis indicates NG integers, and the y-axis indicates the calculated NG length in each assembled genome. The vertical line indicates the NG50 contig/scaffold length. The labels are sorted according to the ranking of contig/scaffold NG50. The solid lines indicate the reference grade assemblies (total size of anchored scaffolds covering ~80% of genome), whereas broken and dotted lines indicate the draft assemblies (total size of anchored scaffolds covering ~50% and ~30%, respectively).
Figure 4
Figure 4. An overview of the azuki bean genome.
The x-axis indicates the physical position in Mb in pseudomolecules of LG1, 2, and 5.

References

    1. International rice genome sequencing project. The map-based sequence of the rice genome. Nature 436, 793–800 (2005). - PubMed
    1. Margulies M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005). - PMC - PubMed
    1. Bentley D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008). - PMC - PubMed
    1. Michael T. P. & VanBuren R. Progress, challenges and the future of crop genomes. Curr. Opin. Plant Biol. 24 71–81 (2015). - PubMed
    1. Wessler S. R. Eukaryotic transposable elements: teaching old genomes new tricks in The implicit genome (ed Caporale L. ) 138–165 (Oxford University Press, 2006)

Publication types

MeSH terms

Associated data

LinkOut - more resources