Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May;27(5):709-721.
doi: 10.1101/gr.213512.116. Epub 2017 Apr 3.

Single-molecule sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster

Affiliations

Single-molecule sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster

Daniel E Khost et al. Genome Res. 2017 May.

Abstract

Highly repetitive satellite DNA (satDNA) repeats are found in most eukaryotic genomes. SatDNAs are rapidly evolving and have roles in genome stability and chromosome segregation. Their repetitive nature poses a challenge for genome assembly and makes progress on the detailed study of satDNA structure difficult. Here, we use single-molecule sequencing long reads from Pacific Biosciences (PacBio) to determine the detailed structure of all major autosomal complex satDNA loci in Drosophila melanogaster, with a particular focus on the 260-bp and Responder satellites. We determine the optimal de novo assembly methods and parameter combinations required to produce a high-quality assembly of these previously unassembled satDNA loci and validate this assembly using molecular and computational approaches. We determined that the computationally intensive PBcR-BLASR assembly pipeline yielded better assemblies than the faster and more efficient pipelines based on the MHAP hashing algorithm, and it is essential to validate assemblies of repetitive loci. The assemblies reveal that satDNA repeats are organized into large arrays interrupted by transposable elements. The repeats in the center of the array tend to be homogenized in sequence, suggesting that gene conversion and unequal crossovers lead to repeat homogenization through concerted evolution, although the degree of unequal crossing over may differ among complex satellite loci. We find evidence for higher-order structure within satDNA arrays that suggest recent structural rearrangements. These assemblies provide a platform for the evolutionary and functional genomics of satDNAs in pericentric heterochromatin.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
FISH image of D. melanogaster mitotic chromosomes showing Rsp and 1.688 satellites. DNA is stained with DAPI (blue). Rsp is in red (closed arrowheads) and 1.688 family satellites are in green. The 260-bp array is located on Chromosome 2L (arrows), and 353/356-bp arrays are located on Chromosome 3L (open arrowheads).
Figure 2.
Figure 2.
(A) Southern blot of a 15-kb PCR amplicon (primer pair 3) (Fig. 3, see below) from the distal region of the Rsp locus digested with EagI, HindIII, SstI, and XmaI. We detected bands for all predicted fragments (predictions in boxes). The location of the15-kb PCR amplicon and the predicted restriction sites are shown in Supplemental Figure S2. (B) The left side shows a genomic Southern blot used to determine the assembly with the correct Rsp locus organization. Only fragments <10 kb in size were resolved. The right side shows a schematic representation of the results. Fragment sizes consistent with the PBcR-BLASR and BLASR-corr Cel8.3 are indicated with red and blue bars, respectively. Thick bars indicate double bands and dashed bars indicate fragments with few predicted Rsp repeats and thus, a comparatively weak signal. Empty boxes represent detected bands from fragments proximal to the assembled Rsp array and/or partially digested DNA. Boxes with an asterisk represent predicted fragments from Rsp repeats on Chromosome 3L. The actual banding pattern is consistent with the PBcR-BLASR assembly (red). Results from the pulse field gel confirming the overall size of the locus are in Supplemental Figure S2. (C) PCR results confirming the presence of two G5 clusters flanking the major Rsp array (primer pairs 1, 2, and 4). Primer pairs 1 and 2 yield a product for a genomic DNA template and not the 15-kb amplicon; primer pair 4 yields a product for both template types, as expected: (*) a product from elsewhere in the genome. The size (in kb) of the predicted band is below each lane: (–) no predicted product.
Figure 3.
Figure 3.
Maps of complex satDNAs contigs. Counts for each repetitive element family in our custom Repbase library were plotted in 3-kb windows across each contig. (A) Rsp locus on Chromosome 2R. Blue bars correspond to Left, Right, variant or truncated repeats, whereas other colors correspond to various TE families as indicated to the right of each contig. Rsp spans ∼170 kb of the 300-kb contig (thick blue line below the x-axis). Above the plot is a schematic showing the orientation of two G5 clusters flanking the Rsp locus and a separate contig containing Rsp and the Jockey element G2, which is directly adjacent to AAGAG satellite repeats. The colors of the chevron outlines indicate the G5 elements with the highest degree of similarity with one another. Solid and dashed lines within the insertions show the approximate locations of shared insertions or deletions, respectively. Several configurations of indels are unique, such as the two in G5_5 or the deletion in G5_1, which allows verification of the cluster. The G2 contig may contain the most centromere-proximal repeats (black circle; see text). (B) Minor Rsp locus on Chromosome 2R. The inset shows the detailed orientation of the two clusters (five Rsp repeats per cluster, ∼100 kb apart); the direction of arrows indicates the relative orientation of the elements. The Rsp repeats (blue chevrons) are nested within Doc5 (orange chevrons) insertions, which are in turn nested within insertions of a transposon known as ProtoP (purple chevrons). The clusters of Rsp+Doc5+ProtoP share ∼96% sequence identity with one another, and are in an inverted orientation. (C) 260-bp locus on Chromosome 2L. Only the area surrounding the 260-bp array is shown (300 kb of ∼1.1-Mb contig). The 260-bp locus spans ∼70 kb of the 1.1-Mb contig (green below the x-axis) and is interrupted with Copia transposable elements.
Figure 4.
Figure 4.
Neighbor-joining tree of complex satDNA monomers. (A) Rsp repeats in the Chromosome 2R locus. Repeats were divided into bins each of which contains one-sixth of the locus, or about 180 repeats/bin. Tip color corresponds to position in the array (red is most centromere-proximal; blue is most distal). The tip symbol indicates if the repeat is Rsp Left (square), Rsp Right (triangle), or variant/truncated (circle). (*) Repeats corresponding to the G2 contig suspected of being centromere-proximal are indicated in pink. Note that these repeats cluster with the repeats on the proximal end of the Rsp contig (red), although it is possible that these are actually distal to the locus. (B) 260-bp repeats in the Chromosome 2L locus. Repeats were divided into bins each of which contains one-fourth of the locus, or about 57 repeats/bin. Tip color corresponds to position in the array (green is most centromere proximal; red is most centromere distal). (C) Aligned consensus Rsp Left and Rsp Right repeat sequences with PCR primers (arrows) used to amplify across Left/Right Rsp dimers. (D) Consensus 260-bp repeat sequence.
Figure 5.
Figure 5.
Distribution of satDNA sequence variants across loci. Each row corresponds to a unique monomer, and the x-axis shows the position of that monomer sequence in the array. The color of the point indicates the copy number of each monomer in the array. (A) The Rsp locus on Chromosome 2R. Several high copy number Rsp variants dominate the center of the array (purple and blue), with the low frequency and unique sequences found more toward the proximal and distal ends (gray and green). One cluster of repeats is duplicated on either side of the array (boxed). (B) The 260-bp locus on Chromosome 2L. The majority of repeats occur only once, although a few variants have intermediate copy number.
Figure 6.
Figure 6.
Neighbor-joining tree of D. melanogaster 1.688 family satellites: the 353/356-bp satellites on Chromosome 3L, the 260-bp locus on Chromosome 2L and a single consensus repeat of 359-bp from the X Chromosome, and a related repeat from D. simulans (360-bp). Tips are colored according to the contig from which they originate. The tip symbol refers to monomer repeat type: 353-bp, 356-bp, or 260-bp. The inset shows a schematic of the Chromosome 3L 1.688 loci and the organization of their respective contigs (the two clusters are ∼2 Mb apart). Contig utg 564 does not align to the reference genome, but we infer its location (*) based on repeats clustering with those in utg 47 and a gap in the reference at this genomic location. Contig utg 565 is unmapped.

References

    1. Abad JP, Agudo M, Molina I, Losada A, Ripoll P, Villasante A. 2000. Pericentromeric regions containing 1.688 satellite DNA sequences show anti-kinetochore antibody staining in prometaphase chromosomes of Drosophila melanogaster. Mol Gen Genet 264: 371–377. - PubMed
    1. Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM. 2015. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol 33: 623–630. - PubMed
    1. Britten RJ, Kohne DE. 1968. Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. Science 161: 529–540. - PubMed
    1. Brutlag D, Carlson M, Fry K, Hsieh TS. 1977. DNA-sequence organization in Drosophila heterochromatin. Cold Spring Harb Sym 42: 1137–1146. - PubMed
    1. Caizzi R, Caggese C, Pimpinelli S. 1993. Bari-1, a new transposon-like family in Drosophila melanogaster with a unique heterochromatic organization. Genetics 133: 335–345. - PMC - PubMed

Publication types