Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May;27(5):697-708.
doi: 10.1101/gr.215095.116. Epub 2017 Mar 30.

Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications

Affiliations

Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications

Matthias H Weissensteiner et al. Genome Res. 2017 May.

Abstract

Accurate and contiguous genome assembly is key to a comprehensive understanding of the processes shaping genomic diversity and evolution. Yet, it is frequently constrained by constitutive heterochromatin, usually characterized by highly repetitive DNA. As a key feature of genome architecture associated with centromeric and subtelomeric regions, it locally influences meiotic recombination. In this study, we assess the impact of large tandem repeat arrays on the recombination rate landscape in an avian speciation model, the Eurasian crow. We assembled two high-quality genome references using single-molecule real-time sequencing (long-read assembly [LR]) and single-molecule optical maps (optical map assembly [OM]). A three-way comparison including the published short-read assembly (SR) constructed for the same individual allowed assessing assembly properties and pinpointing misassemblies. By combining information from all three assemblies, we characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit. Using whole-genome population resequencing data, we estimated the population-scaled recombination rate (ρ) and found it to be significantly reduced in these regions. These findings are consistent with an effect of low recombination in regions adjacent to centromeric or subtelomeric heterochromatin and add to our understanding of the processes generating widespread heterogeneity in genetic diversity and differentiation along the genome. By combining three different technologies, our results highlight the importance of adding a layer of information on genome structure that is inaccessible to each approach independently.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Assembly comparisons. Schematic colored and numbered boxes with arrows correspond to arbitrarily sized homologous regions aligned between the different sequence assemblies based on short reads (SR), long reads (LR), and hybrid scaffolding via optical mapping (SR + OM and LR + OM). Note that boxes 1 and 7 are not present on the SR scaffolds, because they align to another scaffold not shown.
Figure 2.
Figure 2.
Identification of putatively heterochromatic tandem repeat arrays. (A) Shown are the alignments of independent OM assemblies from a carrion and hooded crow individual (light blue) to the SR (dark green) and LR (light green) of the same hooded crow individual. Vertical bars in boxes correspond to nickase motifs of the enzyme Nt.BspQI, and gray vertical bars between boxes indicate orthologous nicks. The nickase motif pattern in both OM contigs matched the end of the SR scaffold or LR contig, and the part beyond is characterized by dense occurrence of nickase motifs every ∼3 kb, indicating a tandem repeat array. We termed such OM contigs “repetitive anchored maps” (RAMs). (B,C) Sequence similarity plots of the 14-kb crowSat1 consensus sequence aligned against assembled contigs/scaffolds of the SR (B) and LR (C) assembly (the same region as shown in A), and self-alignment of the crowSat1 consensus sequence (D). The latter suggests that crowSat1 is an >14-kb tandem repeat with an internal palindrome (blue) of tandemly repeated subunits (red). The most contiguous assembly of crowSat1 units is at the end of contig_000233F of the LR assembly (C) (but see also contig_000396 which entirely consists of crowSat1) (Supplemental Fig. S4), containing the palindrome and 13 tandem repeat units. This region is orthologous to the end of scaffold_100 of the SR assembly, where it exhibits fewer assembled crowSat1 units (B). Note that the flank of the crowSat1-bearing RAM is highly enriched for RepeatMasker-annotated repeats (green; mostly TEs) and many short remnants of crowSat1 (red and blue dots).
Figure 3.
Figure 3.
Chromosome-level distribution of population-scaled recombination rate ρ and structural genome features show, for example, chromosomes of varying size: (Black dots) the weighted mean of ρ/bp in 50-kb windows estimated from a Swedish hooded crow population; (gray lines) SR scaffold ends; (red squares) repetitive anchored map (RAM) with the possible co-occurrence of the crowSat1 satellite. Data are shown for representative synteny- and collinearity-based chromosomes (for the remaining chromosomes, see Supplemental Fig. S5).
Figure 4.
Figure 4.
Population-scaled recombination rate ρ as a function of RAMs and crowSat1 satellites. Box plots show loge(ρ) in units of 4Ner/bp as estimated in 50-kb windows for Swedish hooded crow and German carrion crow populations. Values are broken down by category of windows representing the genome (red), windows adjacent to scaffold ends (blue), windows adjacent to RAMs (green), and windows including crowSat1 (violet). Straight horizontal lines depict the median, box margins indicate the interquartile range between 25% and 75% quantiles, and whiskers extend to 1.5-times the interquartile range with values beyond shown as points. Asterisks denote the significance level based on t-tests corrected for multiple comparisons.
Figure 5.
Figure 5.
Structural genome features and population genetic summary statistics surrounding a peak of extreme genetic differentiation between hooded and carrion crows on Chromosome 18. Comparison of population genetic summary statistics ρ/bp, θW, and FST in 50-kb windows: (horizontal green bars) SR assembly with crowSat1 locations in dark red; (horizontal blue bars) OM contigs with RAMs schematically shown with densely spaced nickase motifs; (vertical gray bars) SR scaffold ends.

Similar articles

Cited by

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215: 403–410. - PubMed
    1. Anantharaman T, Mishra B. 2001. False positives in genomic map assembly and sequence validation. In Algorithms in bioinformatics first international workshop, WABI 2001, Århus, Denmark.
    1. Arbeithuber B, Betancourt AJ, Ebner T, Tiemann-Boege I. 2015. Crossovers are associated with mutation and biased gene conversion at recombination hotspots. Proc Natl Acad Sci 112: 2109–2114. - PMC - PubMed
    1. Backström N, Forstmeier W, Schielzeth H, Mellenius H, Nam K, Bolund E, Webster MT, Öst T, Schneider M, Kempenaers B, et al. 2010. The recombination landscape of the zebra finch Taeniopygia guttata genome. Genome Res 20: 485–495. - PMC - PubMed
    1. Bao W, Kojima KK, Kohany O. 2015. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6: 11. - PMC - PubMed

Publication types

LinkOut - more resources