Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 16:14:257.
doi: 10.1186/1471-2164-14-257.

Improving mammalian genome scaffolding using large insert mate-pair next-generation sequencing

Affiliations

Improving mammalian genome scaffolding using large insert mate-pair next-generation sequencing

Sebastiaan van Heesch et al. BMC Genomics. .

Abstract

Background: Paired-tag sequencing approaches are commonly used for the analysis of genome structure. However, mammalian genomes have a complex organization with a variety of repetitive elements that complicate comprehensive genome-wide analyses.

Results: Here, we systematically assessed the utility of paired-end and mate-pair (MP) next-generation sequencing libraries with insert sizes ranging from 170 bp to 25 kb, for genome coverage and for improving scaffolding of a mammalian genome (Rattus norvegicus). Despite a lower library complexity, large insert MP libraries (20 or 25 kb) provided very high physical genome coverage and were found to efficiently span repeat elements in the genome. Medium-sized (5, 8 or 15 kb) MP libraries were much more efficient for genome structure analysis than the more commonly used shorter insert paired-end and 3 kb MP libraries. Furthermore, the combination of medium- and large insert libraries resulted in a 3-fold increase in N50 in scaffolding processes. Finally, we show that our data can be used to evaluate and improve contig order and orientation in the current rat reference genome assembly.

Conclusions: We conclude that applying combinations of mate-pair libraries with insert sizes that match the distributions of repetitive elements improves contig scaffolding and can contribute to the finishing of draft genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
MP insert size distribution and library complexity. (a) Insert size distribution of all mate-paired libraries and biological duplicates. Data have been filtered for non-clonal pairs. (b) Complexity of each library is depicted by the number of unique read-pairs versus the number of properly mapped read-pairs. On the x-axis, increasing sequencing depth is represented based on actual sequencing data versus the amount of unique information obtained on the y-axis. A plateau indicates that a library has been sequenced to saturation.
Figure 2
Figure 2
Bridging of repeat elements by paired read libraries. (a) The percentage of each repeat type per window of 1000 repeats (y-axis) is shown, relative to the size of each repeat on the x-axis. A higher density of dots indicates the presence of more repeats in the indicated size bin. (b) Pie chart of the largest classes of repetitive elements based on their total length (Mb) in the rat genome. Satellite repeats, RNA repeats, and low-complexity repeats are listed as “Other.” (c + d) Bridging by paired-tag libraries of all annotated LINEs (c) and LTRs (d) within contigs of RGSC 3.4. The size of LINE elements or LTRs (x-axis) is plotted against the percentage of elements of that specific size that were bridged by one or more read-pairs from each of the libraries. All single library datasets were normalized to 8.5× physical genome coverage.
Figure 3
Figure 3
Combinations of libraries with different insert sizes improve contig scaffolding. (a) All library data sets were normalized to 8.5× non-clonal physical genome coverage resulting in the use of approximately 130 million pairs for the PE library to several million pairs for the MPs. The scaffold N50 (y-axis) as determined by SSPACE is plotted against the total number of scaffolds (x-axis) for each individual library and for all combinations of libraries. Scaffolding results for the current genome reference (RGSC 3.4) are displayed as well. (b) Representative examples of the genomic loci on rat chromosome 18 that show major discordance between optical map and the RGSC 3.4 reference genome. MP-assisted scaffolding restored concordance between sequence scaffolds and optical maps. The top panel (black) represents the reference genome assembly with the vertical lines indicating predicted SwaI sites; the middle panel (red) represents optical map data obtained using SwaI digests; the lower panel represents the rescaffolded genome using the MP data. The indicated positions on chromosome 18 are according to the current RGSC 3.4 assembly. A large region of approximately 75 kb (top panel) that shows low concordance with the predicted path of the optical map (0.065 Mb–0.14 Mb), increased significantly after MP-scaffolding. The bottom panel shows another example of increased resemblance to optical mapping data (3.85 Mb–3.90 Mb). Order and placement of contigs was shifted in the new scaffold resulting in SwaI sites identical to the optical map.

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA. The sequence of the human genome. Science. 2001;291:1304–1351. - PubMed
    1. Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004;428:493–521. - PubMed
    1. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. - PubMed
    1. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. - PubMed

Publication types

LinkOut - more resources