Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 14;17(5):e3000241.
doi: 10.1371/journal.pbio.3000241. eCollection 2019 May.

Islands of retroelements are major components of Drosophila centromeres

Affiliations

Islands of retroelements are major components of Drosophila centromeres

Ching-Ho Chang et al. PLoS Biol. .

Abstract

Centromeres are essential chromosomal regions that mediate kinetochore assembly and spindle attachments during cell division. Despite their functional conservation, centromeres are among the most rapidly evolving genomic regions and can shape karyotype evolution and speciation across taxa. Although significant progress has been made in identifying centromere-associated proteins, the highly repetitive centromeres of metazoans have been refractory to DNA sequencing and assembly, leaving large gaps in our understanding of their functional organization and evolution. Here, we identify the sequence composition and organization of the centromeres of Drosophila melanogaster by combining long-read sequencing, chromatin immunoprecipitation for the centromeric histone CENP-A, and high-resolution chromatin fiber imaging. Contrary to previous models that heralded satellite repeats as the major functional components, we demonstrate that functional centromeres form on islands of complex DNA sequences enriched in retroelements that are flanked by large arrays of satellite repeats. Each centromere displays distinct size and arrangement of its DNA elements but is similar in composition overall. We discover that a specific retroelement, G2/Jockey-3, is the most highly enriched sequence in CENP-A chromatin and is the only element shared among all centromeres. G2/Jockey-3 is also associated with CENP-A in the sister species D. simulans, revealing an unexpected conservation despite the reported turnover of centromeric satellite DNA. Our work reveals the DNA sequence identity of the active centromeres of a premier model organism and implicates retroelements as conserved features of centromeric DNA.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. CENP-A binding association with satellites and transposable elements.
(A) Schematic of the strategy used to identify the DNA sequence of D. melanogaster centromeres. The Illumina reads are 2 × 150 bp. (B) Kseek plot showing the relative abundance of simple repeat sequences in CENP-A ChIP compared to the input. Plotted on the x-axis is the median of CENP-A ChIP reads normalized over total mapped CENP-A ChIP reads across four ChIP replicates. Plotted on the y-axis is the median of input reads normalized over total mapped input reads across four replicates. The top 7 kmers in the ChIP read abundance are labeled. The line represents the enrichment of CENP-A ChIP/input for AATAC, a noncentromeric simple repeat. Repeats to the right of the line are putatively enriched in CENP-A. See also S1 Fig and S1 Table. (C) Plot of the normalized CENP-A/input reads on a log scale for each replicate, sorted by median (red lines) for complex repeat families. Shown are only the complex repeats in the top 20% across all four CENP-A ChIP replicates. See also S2 Fig and S2 Table. CENP-A, centromere protein A; ChIP, chromatin immunoprecipitation; ChIP-seq, ChIP sequencing; IF-FISH, immunofluorescence–fluorescence in situ hybridization; IGS3cen, intergenic spacer of the ribosomal genes on the third centromere; Prodsat, Prod satellite; qPCR, quantitative PCR; S2, Schneider 2; TART, Telomere-associated retrotransposon.
Fig 2
Fig 2. CENP-A occupies DNA sequences within putative centromere contigs.
Organization of each CENP-A-enriched island corresponding to centromere candidates: (A) CenX, (B) Cen4; (C) CenY; (D) Cen3; (E) Cen2. Different repeat families are color coded (see legend; note that Jockey elements are shown in one color even though they are distinct elements). Shown are the normalized CENP-A enrichment over input (plotted on a log scale) from one replicate (replicate 2, other replicates are in S4 Fig) colored in gray for simple repeats and black for complex island sequences. Although the mapping quality scores are high in simple repeat regions, we do not use these data to make inferences about CENP-A distribution (see text for details). The coordinates of the significantly CENP-A-enriched ChIPtigs mapped to these contigs (black), and the predicted ChIP peaks (orange) are shown below each plot. See also S4 Fig and S3 and S4 Tables. Cen2, centromere 2; Cen3, centromere 3; Cen4, centromere 4; CENP-A, centromere protein A; CenX, X centromere; CenY, Y centromere; ChIP, chromatin immunoprecipitation; IGS, intergenic spacer of the ribosomal genes; LTR, long terminal repeat; Prodsat, Prod satellite.
Fig 3
Fig 3. Centromeres are enriched in non-LTR retroelements in the Jockey family.
(A) Density of all repetitive elements on each candidate centromere contig and the entire genome (minus the centromeres) grouped by type: non-LTR retroelements, LTR retroelements, rDNA-related sequences, and simple satellites. G2/Jockey-3 is present on all centromeres. An * indicates annotations based on similarity to retroelements in other Drosophila species: Gypsy-2 is from D. simulans, Gypsy-24 and Gypsy-27 are from D. yakuba, and Gypsy-7 is from D. sechellia. For annotation, see Dryad repository file 9: https://doi.org/10.5061/dryad.rb1bt3j [37]. The underlying data can be found in S1 Data. (B) Maximum-likelihood phylogenetic tree based on the entire sequence of all G2/Jockey-3 copies in D. melanogaster inside (squares) and outside (circles) of centromeric contigs and on the consensus repeat in its sister species D. sechellia and D. simulans and a more distantly related species (D. yakuba). The tree shows that centromeric G2/Jockey-3 elements do not have a single origin (see Dryad repository files 13 and 15: https://doi.org/10.5061/dryad.rb1bt3j [37]). Cen2, centromere 2; Cen3, centromere 3; Cen4, centromere 4; CenX, X centromere; CenY, Y centromere; ETS, external transcribed spacer; IGS, intergenic spacer of the ribosomal genes; ITS, internal transcribed spacer; LTR, long terminal repeat; Prodsat, Prod satellite; TART, Telomere-associated retrotransposon.
Fig 4
Fig 4. Genomic distribution of G2/Jockey-3 elements in the D. melanogaster genome.
Location of G2/Jockey-3 elements across chromosome (“Ch”) 2 (A), 3 (B), 4 (C), X (D), and Y (E). Contigs from each chromosome were concatenated in order with an arbitrary insertion of 100 kb of “N.” Distances along the x-axis are approximate. The order and orientation of the Y chromosome contigs is based on gene order (see [19]). Each triangle corresponds to one TE, for which filled shapes indicate full-length TEs and open shapes indicate truncated TEs. The vertical gray bars represent the arbitrary 100-kb window inserted between contigs, indicating where there are gaps in our assembly. The centromere (“CEN”) positions are set to 0 for each chromosome. The insets zoom in to show the distribution of G2/Jockey-3 elements on the centromere contigs. Chromosomes are not drawn to scale (chromosome 4 and Y are enlarged). TE, transposable element.
Fig 5
Fig 5. Islands of complex DNA are major components of centromeres.
(A-D) Top, mitotic chromosomes from male larval brains showing IF with anti-CENP-C antibodies (green, inset) and FISH with chromosome-specific Oligopaints (magenta). Bar 1 μm. Middle, schematic of centromere contigs (see key) and location of Oligopaint probes (magenta). Bottom, IF-FISH on extended chromatin fibers from female larval brains. Anti-CENP-A antibodies (green), Oligopaints FISH (in panels A, B, and D; magenta), and centromere-specific satellites (cyan, and in E also in magenta). Dashed rectangles show the span of the Oligopaint probes, except for (E), where it is placed arbitrarily within the CENP-A domain where the Cen2 contig could be located. Bar 5 μm. (A) CenX; (B) Cen4; (C) CenY; (D) Cen3 (see also S20 Fig); (E) Cen2 using FISH probes AAGAG (magenta) and AATAG (cyan). The scale shown for the Cen2 diagram is approximate. (F) Scatterplot of CENP-A IF signal length for each centromere. Error bars = SD. n = 18–30 fibers for each centromere. Significant P values are shown (unpaired t test). The underlying data can be found in S1 Data. (G) Table showing the lengths of Oligopaint (“Olig.”) FISH and CENP-A IF signals on fibers (kb ± SD estimated based on 10 μm = 101 kb; S15 Fig). Percent overlap corresponds to CENP-A domain length/Oligopaint FISH length. The difference between the sizes of the CENP-A domain and the corresponding islands is significant (unpaired t test). Additional fibers are shown in S16, S17, S18, S19, S20 and S21 Figs. Cen2, centromere 2; Cen3, centromere 3; Cen4, centromere 4; CENP-A, centromere protein A; CENP-C, centromere protein C; CenX, X centromere; CenY, Y centromere; FISH, fluorescence in situ hybridization; IF, immunofluorescence; IGS, intergenic spacer of the ribosomal genes; n/a, not applicable; Prodsat, Prod satellite.
Fig 6
Fig 6. The association between G2/Jockey-3 and centromeres is conserved in D. simulans.
(A) Plot of the normalized CENP-A enrichment over input across the D. simulans G2/Jockey-3 consensus sequence using CENP-A ChIP-seq data from D. simulans ML82-19a cells [16] showing that G2/Jockey-3 is enriched in CENP-A in D. simulans. The labels “15m” and “5m” indicate minutes of MNase digestion, and IP and IP2 are technical replicates. Note that the first 487 bp of D. simulans G2/Jockey-3 consensus sequence, which are homologous to the D. simulans 500-bp satellite, are not included in this figure; the 500-bp satellite was previously reported as enriched in CENP-A in D. simulans [16]. (B) Plot of the normalized CENP-A enrichment over input across the D. melanogaster G2/Jockey-3 consensus sequence using our CENP-A ChIP-seq replicates (R1–R4) and ChIP-seq from CENP-A–GFP transgenic flies from Talbert and colleagues [16]. The underlying data for (A-B) can be found in S1 Data. IF-FISH on (C) D. simulans (w501) and (D) D. melanogaster (iso-1) mitotic chromosomes from male larval brains using an antibody for CENP-C (magenta) and FISH with a G2/Jockey-3 DIG-labeled FISH probe (yellow). DAPI is shown in gray. Bar 5 μm. CENP-A, centromere protein A; CENP-C, centromere protein C; ChIP, chromatin immunoprecipitation; ChIP-seq, ChIP sequencing; DIG, digoxigenin; FISH, fluorescence in situ hybridization; GFP, green fluorescent protein; IF, immunofluorescence; IP, immunoprecipitation.
Fig 7
Fig 7. Drosophila centromere organization and widespread presence of retroelements at centromeres.
(A) Schematic showing the organization of D. melanogaster centromeres. For at least CenX, Cen4, and Cen3, the bulk of CENP-A chromatin is associated with the centromere islands, whereas the remaining CENP-A is on the flanking satellites. The sequences flanking the Y centromere are not in our assembly, so whether CENP-A is also on satellites is unknown. Although the complexity of island DNA allowed us to identify centromere contigs by long-read sequencing, the flanking satellites remain largely missing from our genome assembly because of their highly repetitive nature. The approximate satellite size estimates are based on Jagannathan and colleagues’ work [25]. (B) Phylogenetic tree showing that centromere-associated retroelements are common across highly diverged lineages: Gossypium hirsutum (cotton) [47], Zea mays mays (maize) [9, 48], Oryza sativa (rice) [–51], Triticum boeoticum (wild wheat) [52], Cryptococcus [53], Phyllostomid (bat) [54], Hoolock leuconedys (gibbon) [55], Homo sapiens (human) [56] (and a human neocentromere [57]), Macropus eugenii (tammar wallaby) [–60], Phascolarctos cinereus (koala) [61], and D. melanogaster (this study for endogenous centromeres; also in an X-derived minichromosome [14, 15]). The phylogeny was constructed using TimeTree [62]. Indicated are the retroelement type and the clade that the element belongs to with element types as follows: LTR and non-LTR. The circles indicate the experimental evidence for centromere association of retroelements: FISH, CENP-A ChIP-seq (ChIP), and genome or BAC sequencing (Seq). BAC, bacterial artificial chromosome; CENP-A, centromere protein A; CenX, X centromere; ChIP-seq, chromatin immunoprecipitation sequencing; CRM, centromeric retrotransposons of maize; CRR, centromeric retrotransposons of rice; CRW, centromeric retrotransposons of wheat; FISH, fluorescence in situ hybridization; LAVA, LINE-Alu-VNTR-Alu-like; LINE, long interspersed nuclear element; LTR, long terminal repeat; Mya, million years ago.

References

    1. Mendiburo MJ, Padeken J, Fulop S, Schepers A, Heun P. Drosophila CENH3 is sufficient for centromere formation. Science. 2011;334(6056):686–90. 10.1126/science.1206880 . - DOI - PubMed
    1. McKinley KL, Cheeseman IM. The molecular basis for centromere identity and function. Nat Rev Mol Cell Biol. 2016;17(1):16–29. 10.1038/nrm.2015.5 . - DOI - PMC - PubMed
    1. Allshire RC, Karpen GH. Epigenetic regulation of centromeric chromatin: old dogs, new tricks? Nat Rev Genet. 2008;9(12):923–37. 10.1038/nrg2466 - DOI - PMC - PubMed
    1. Pidoux AL, Allshire RC. Kinetochore and heterochromatin domains of the fission yeast centromere. Chromosome Res. 2004;12(6):521–34. 10.1023/B:CHRO.0000036586.81775.8b . - DOI - PubMed
    1. Ohzeki J, Bergmann JH, Kouprina N, Noskov VN, Nakano M, Kimura H, et al. Breaking the HAC Barrier: histone H3K9 acetyl/methyl balance regulates CENP-A assembly. EMBO J. 2012;31(10):2391–402. 10.1038/emboj.2012.82 - DOI - PMC - PubMed

Publication types