Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun:10847:63-78.
doi: 10.1007/978-3-319-94968-0_6. Epub 2018 Jul 13.

REXTAL: Regional Extension of Assemblies Using Linked-Reads

Affiliations

REXTAL: Regional Extension of Assemblies Using Linked-Reads

Tunazzina Islam et al. Bioinform Res Appl (2018). 2018 Jun.

Abstract

It is currently impossible to get complete de-novo assembly of segmentally duplicated genome regions using genome-wide short-read datasets. Here, we devise a new computational method called Regional Extension of Assemblies Using Linked-Reads (REXTAL) for improved region-specific assembly of segmental duplication-containing DNA, leveraging genomic short-read datasets generated from large DNA molecules partitioned and barcoded using the "Gel Bead in Emulsion" (GEM) microfluidic method (Zheng et al., 2016). We show that using REXTAL, it is possible to extend assembly of single-copy diploid DNA into adjacent, otherwise inaccessible subtelomere segmental duplication regions and other subtelomeric gap regions. Moreover, REXTAL is computationally more efficient for the directed assembly of such regions from multiple genomes (e.g., for the comparison of structural variation) than genome-wide assembly approaches.

Keywords: 10X sequencing; Linked-read sequencing; Subtelomere; assembly; genome gaps; segmental duplication; structural variation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Conceptual description of GEM microfluidic method. Circle (blue, magenta) represents gel beads. Each bead contains many copies of a 16-base barcode (Rectangles inside the circle) unique to that bead. Each partition gets one gel bead. The 10 curve lines inside the large square (represents partition) represent molecules of length approximately 50 kb–100 kb. The green and orange ovals represent short reads of length 150 bases which are obtained from these molecules (curve lines).
Figure 2.
Figure 2.
A: Flowchart. B: Details of Reads Selection algorithm is shown inside dotted box
Figure 3.
Figure 3.
Four different chromosomes with different characteristics. The blue rectangle represents single copy region and the magenta rectangle represents segmental duplication region.
Figure 4
Figure 4
A: Alignment of 2p 500kb as query with assembled scaffolds of 2p for range 10–70 as subject in BLAST. B: Alignment of 19p 1-copy 300kb as query with assembled scaffolds of 19p 1-copy for range 3–70 as subject in BLAST. C: Alignment of 10p 1-copy 300kb as query with assembled scaffolds of 10p 1-copy for range 3–70 as subject in BLAST. D: Alignments of two 1-copy regions of 5p as query with assembled scaffolds of 5p 1-copy regions for range 3–70 as subject in BLAST.
Figure 5.
Figure 5.
Top magenta rectangle represents the query sequence. A: Partially overlapped local alignment regions and gaps in coverage of the query sequence. B: Considering partially overlapped local alignment regions as sequence contigs and each sequence contig region (C) is followed by one sequence gap (G). Dotted blue lines represent starting position and ending position of gap.
Figure 6.
Figure 6.
Algorithm to calculate LAF.
Figure 7.
Figure 7.
A: Alignment of 2p with assembled scaffolds of 2p for range 10–60 of REXTAL. B: Alignment of 2p as query with assembled scaffolds of 2p extracted from genome-wide assembly. C: Alignment of 19p with assembled scaffolds of 19p 1-copy for range 3–70 of REXTAL. D: Alignment of 19p with assembled scaffolds of 19p 1-copy region extracted from genome-wide assembly. E: Alignment of 10p with assembled scaffolds of 10p 1-copy for range 3–70 of REXTAL. F: Alignment of 10p with assembled scaffolds of 10p 1-copy region extracted from genome-wide assembly. G: Alignment of 5p with assembled scaffolds of 5p 1-copy regions for range 3–70 of REXTAL. H: Alignment of 5p with assembled scaffolds of 5p 1-copy regions extracted from genome-wide assembly.
Figure 8.
Figure 8.
A: Alignment of 19p segmental duplication region with assembled scaffolds of 19p 1-copy for range 3–70 of REXTAL. B: Alignment of 19p segmental duplication region with assembled scaffolds of 19p 1-copy region extracted from genome-wide assembly. C: Alignment of 10p segmental duplication region with assembled scaffolds of 10p 1-copy for range 3–70 of REXTAL. D: Alignment of 10p segmental duplication region with assembled scaffolds of 10p 1-copy region extracted from genome-wide assembly.

References

    1. Alkan C, Sajjadian S, Eichler EE (2011). Limitations of next-generation genome sequence assembly. Nature methods, 8, 61. - PMC - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 25, 3389–3402. - PMC - PubMed
    1. Benson G (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research, 27, 573. - PMC - PubMed
    1. Gurevich A, Saveliev V, Vyahhi N, Tesler G. (2013). QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29, 1072–1075. - PMC - PubMed
    1. Kent WJ (2002). BLAT—the BLAST-like alignment tool. Genome research, 12, 656–664. - PMC - PubMed

LinkOut - more resources