Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(9):e46211.
doi: 10.1371/journal.pone.0046211. Epub 2012 Sep 27.

Paired-end sequencing of long-range DNA fragments for de novo assembly of large, complex Mammalian genomes by direct intra-molecule ligation

Affiliations

Paired-end sequencing of long-range DNA fragments for de novo assembly of large, complex Mammalian genomes by direct intra-molecule ligation

Asan et al. PLoS One. 2012.

Abstract

Background: The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.

Results: We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10-20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over 100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.

Conclusions: In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Comparison of long-range PE sequencing methods.
(A–D) Long-range PE sequencing with linker oligonucleotides. In these methods, biotin-labeled linker oligonucleotides are added to the two ends of long-range DNA fragments, followed by enzymes-induced intra-molecule circularization, and recovery of the paired-end for sequencing. The addition of linker oligonucleotides and subsequent complex enzyme reactions require 5–8 recoveries before capturing the paired-ends from circularized DNA fragments. In addition, the use of expensive enzymes involves additional costs. (E), Long-range PE sequencing by direct intra-molecule ligation or molecular linker-free circularization. In the method, the 3′ends of long-range DNA fragments were biotin-labeled, followed by direct intra-molecule circularization and recovery of PE ends. This method requires less recovery steps (3–4) and no complex enzyme reaction system. The steps for DNA recovery are in bold. We applied the method E in this research.
Figure 2
Figure 2. Insert-size distributions of long-range PE sequencing libraries.
(A), 2- to 35-kb libraries; (B), 10 kb-WGA and 10 kb-dam libraries. The read-pairs that were uniquely mapped to the human genome (NCBI build 37) were used for this analysis. The insert size of a library and its corresponding small insert read contamination are shown in the ‘−’ and ‘+’direction of the x-axis, respectively. The ‘−’ direction represents the orientation relationship between PEs from circularized long-range DNA molecules (>1 kb) when mapped to the human genome, while ‘+’ represents that between the two ends from linear small DNA fragments (∼500 bp).
Figure 3
Figure 3. De novo assembly of the YH genome.
(A), The YH scaffold N50 (green bar) and N90 (blue bar) sizes were dramatically improvement with the addition of long-range PE information (from 2 kb to 35 kb). The trends of improvement are shown as a dashed line. (B), Alignment between the assembled YH scaffolds (y-axis) and the reference human genome (NCBI build 37, x-xis) on chr8. Local repeat level in the reference chr8 (calculated in a 1-kb window) is showed in color along the chromosome at the top-up bar. The white blocks in the bar represent the gaps in the reference genome. (C), Alignment of the YH scaffold 320 onto the reference chr8. Local repeat level on the region of the reference chr8 is also shown in color along the sequence (calculated in a 1-kb window).
Figure 4
Figure 4. Two long insertions in YH genome detected by long-range PE.
Mapping the long-range PE reads back to the human genome (NCBI build 37) resulted in the detection of a previously identified ∼8 kb insertion in chromosome 7 (A) and a novel ∼7 kb insertion in chromosome 14 (B) in the YH genome. The abnormally mapped PE reads that supported the insertions by showing unexpected short insert size are shown.

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921. - PubMed
    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. (2001) The sequence of the human genome. Science 291: 1304–1351. - PubMed
    1. Genome 10K Community of Scientists (2009) Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J Hered 100: 659–674. - PMC - PubMed
    1. McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu Y, et al. (2009) Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res 19: 1527–1541. - PMC - PubMed
    1. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376–380. - PMC - PubMed

Publication types