Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Recombination Suppression Drives Expansion of the Drosophila Dot Chromosome

Timothy J Stanek et al. Mol Biol Evol. .

Abstract

Genome size varies widely, even among closely related species, yet much less is known about chromosome size variation. Here we use the fourth chromosome of Drosophila, also known as the "Muller F element" or "dot chromosome", as a model to investigate chromosome-specific size expansion. The F element of most Drosophila species is small (∼1.3 Mb) and almost entirely heterochromatic, yet harbors approximately 80 protein-coding genes. Here, we study D. kikkawai, D. takahashii, D. ananassae, and D. bipectinata, whose F elements are 2- to 15-fold larger in size compared to D. melanogaster. Through manual gene curation and comparative genomic analysis, we find that their F elements have expanded primarily via accumulation of transposable elements (TEs) in introns and intergenic regions. Natural selection appears less efficient on these expanded F elements: they have smaller effective population sizes and their genes exhibit reduced usage of optimal codons, compared to D. melanogaster. We propose that F element size variation is driven by differences in F element recombination rates. The ultra-long (∼20 Mb) F elements of D. ananassae and D. bipectinata display high rates of rearrangement and sequence evolution and exhibit independent TE-driven expansions. Our results suggest that F elements of most Drosophila species likely recombine enough to prevent size expansion, while F element recombination in D. ananassae and D. bipectinata is either absent or rare enough to allow TEs and other deleterious mutations to accumulate via Muller's ratchet; thus, these chromosomes evolve more like a Y chromosome than a typical Drosophila F element.

Keywords: Drosophila; genome size; heterochromatin.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Drosophila species with expanded F elements. a) Karyotypes of the four expanded F element species. Muller elements A to F correspond to D. melanogaster chromosomes X, 2L, 2R, 3L, 3R, and 4, respectively. The X chromosomes (Muller A) and F elements of D. ananassae and D. bipectinata are both metacentric. Both telocentric (dark red arm only) and metacentric (both dark and light red arms) F elements have been reported in D. kikkawai (Baimai and Chumchong 1980). Previous work suggests that the strain used here has the acrocentric F element (Leung et al. 2023). Panel adapted from Leung et al. (2023). b) Phylogenetic relationships among the species studied here, along with the F element scaffold length from their respective genome assemblies (Leung et al. 2023). Divergence times (in millions of years) are shown for each node of the tree (Suvorov et al. 2022).
Fig. 2.
Fig. 2.
F element expansion occurs within both introns and intergenic regions. Genes on the Muller F and the reference region of Muller D for each species were annotated by members of the GEP. a) Total coding span of Muller D and Muller F genes from each species. b) Sum of intron lengths per gene. c) Sum of coding exon lengths per gene. d) Total exon count per gene. e) Intergenic lengths. f) Total intron count per gene. For each violin, the dot demarcates the median, the box represents the interquartile range, and the whiskers represent 1.5 × the interquartile range. For each panel, an asterisk after the plot header identifies statistically significant interactions between the independent variables chromosome and species (two-way non-parametric analysis of variance, see Methods). For post-hoc analyses (ART-C procedure with Holm correction, see Methods), a pound symbol (#) identifies statistically significant (adjusted P < 0.05) comparisons of Muller F to Muller D within a species, while the dollar symbol ($) identifies statistically significant (adjusted P < 0.05) comparisons of the same Muller element between each expanded F species and D. melanogaster. All tests and their associated statistical values can be found in Table S1. Values for gene coding span length, CDS length, intron length, CDS count, intron count, and intergenic length are available in Table S2.
Fig. 3.
Fig. 3.
Repetitive elements drive F element expansions. a) Repetitive and non-repetitive sequence composition of Muller F, by species. Each species with an asterisk has a significantly larger proportion of repetitive DNA on its F element compared to D. melanogaster (Fisher's exact test adjusted P < 0.05). Diagonal patterning identifies repeat classes in each species that comprise a significantly larger proportion of the F element compared to D. melanogaster (Fisher's exact test adjusted P < 0.05). Note: Manual inspection of the most abundant elements from the “Unknown” category in D. takahashii and D. kikkawai suggests that they are primarily DNA transposons. b) Log2-fold change in size (bp) relative to the respective D. melanogaster category. Abbreviations: CDS, coding sequence; DNA, DNA transposon; LINE, long interspersed nuclear element retrotransposon; LTR, long terminal repeat retrotransposon; RC/Helitron, rolling circle/Helitron transposon; Unknown, unknown/unclassified repeat, as reported by Earl Grey (Baril et al. 2024). The simple repeat category is composed of noncoding simple repeats (excluding transposon sequences). Thenoncoding category is composed of unmasked noncoding sequence (see Methods). The TE category is composed of DNA, LINE, LTR, and RC/Helitron transposons (excluding unknown/unclassified). All tests and their associated statistical values can be found in Table S1.
Fig. 4.
Fig. 4.
F element genes show less biased codon usage. Normalized CAI were determined for all genes on the F element and D element reference region for each species (see Methods). a) Normalized CAI for Muller F genes and Muller D reference genes. For each violin, the dot demarcates the median, the box represents the interquartile range, and the whiskers represent 1.5 × the interquartile range. b) Normalized CAI values versus distance from centromere. Orange points represent wanderer genes as annotated in Table 2. Correlation coefficients and P-values were calculated using Spearman's rank correlation coefficient, excluding wanderer genes. c) Scatterplots comparing normalized CAI values between F element orthologs in the expanded F species against D. melanogaster. Correlation coefficients and P-values were calculated using Spearman's rank correlation coefficient. d) Scatterplot comparing normalized CAI values between F element orthologs of D. ananassae against D. bipectinata. Note that a third gene, CG33941, shows strong codon usage bias in both D. ananassae and D. bipectinata but not in the other species. Wanderer genes were excluded from panels (c) and (d). For 4A, the asterisk after the plot header identifies statistically significant interactions between the independent variables chromosome and species (two-way non-parametric analysis of variance). For post-hoc analyses (ART-C procedure with Holm correction), a pound symbol (#) identifies statistically significant (adjusted P < 0.05) comparisons of Muller F to Muller D within a species, while the dollar symbol ($) identifies statistically significant (adjusted P < 0.05) comparisons of the same Muller element between each expanded F species and D. melanogaster. All tests and their associated statistical values can be found in Table S1.
Fig. 5.
Fig. 5.
F element synteny comparison. a) Multi-species visualization of F element synteny. Each link (light blue) connects a pair of orthologous genes between species. Divergence times between species pairs are shown on the left. Note: the F elements of each species are not drawn to scale–their lengths are shown on the right-hand side of the plot. b) The minimum number of chromosomal rearrangements between the species pairs shown in (a), as calculated by GRIMM (Tesler 2002). c) Synteny blocks were identified from whole genome alignments between D. melanogaster and D. takahashii, and between D. bipectinata and D. ananassae. Mean synteny block size for each Muller element is shown for pairwise comparisons between ananassae/bipectinata versus melanogaster/takahashii. d) Synteny coverage (ie aligned fraction of chromosome) is shown for the same comparison as in (c).
Fig. 6.
Fig. 6.
F element sequence conservation. a) Whole genome alignments were constructed using genome assemblies from the 14 species shown in the phylogenetic tree. Red text indicates the ananassae species group, the members of which all have a highly expanded F element. Bold text indicates the focal species used in the conservation analyses shown in (b). Node labels indicate divergence times (in millions of years, where available) from Suvorov et al. (2022). Note that branch lengths are not drawn to scale. b) For each bold focal species shown in (a), pairwise comparisons were made to each of the other species in the tree. For each comparison, the fraction of aligned bases between the two species was calculated for 20 kb windows across either the Muller D reference region (d) or the entire F element (f). Note: D. pseudoananassae is abbreviated D.pse.ananassae. c) Pairwise alignment rates (D. melanogaster versus D. takahashii and D. ananassae versus D. bipectinata) for coding and noncoding regions of the Muller D reference region and the Muller F element.
Fig. 7.
Fig. 7.
Distinct repeat content of ananassae group F elements. a) Reciprocal repeat masking results for eleven ananassae group species plus D. melanogaster, D. takahashii, and D. kikkawai (see Methods). The F elements of each ananassae group species complex (ie ananassae, bipectinata, and ercepeae) carry a subset of repeats that are not found in the other complexes. The heatmap cells are colored based on the percentage of the F element that is masked. Differences in color along the diagonal are due to differences in F element repeat density among species. b) Visualization of repeat content in the introns of the zfh2 gene across members of the ananassae species group. Black rectangles represent zfh2 coding exons. Colored rectangles represent matches to individual TE families identified in either D. ananassae (left panel) or D. bipectinata (right panel) TEs. TE family names and coordinates within zfh2 for each species are provided in Table S4.
Fig. 8.
Fig. 8.
Effective population size and linkage disequilibrium. a) Estimates of effective population size (Ne) for both the autosomes and F elements. b) The mean physical distance at which linkage disequilibrium (LD) decays to ¾ of its maximum value. c) The F element versus autosomal (F/A) ratios of LD decay.

References

    1. Arguello JR et al. Recombination yet inefficient selection along the Drosophila melanogaster subgroup's fourth chromosome. Mol Biol Evol. 2010:27:848–861. 10.1093/molbev/msp291. - DOI - PMC - PubMed
    1. Armstrong J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020:587:246–251. 10.1038/s41586-020-2871-y. - DOI - PMC - PubMed
    1. Badet T, Tralamazza SM, Feurtey A, Croll D. Recent reactivation of a pathogenicity-associated transposable element is associated with major chromosomal rearrangements in a fungal wheat pathogen. Nucleic Acids Res. 2024:52:1226–1242. 10.1093/nar/gkad1214. - DOI - PMC - PubMed
    1. Baimai V, Chumchong C. Karyotype variation and geographic distribution of the three sibling species of the Drosophila kikkawai complex. Genetica. 1980:54:113–120. 10.1007/BF00055979. - DOI
    1. Balachandran P et al. Transposable element-mediated rearrangements are prevalent in human genomes. Nat Commun. 2022:13:7115. 10.1038/s41467-022-34810-8. - DOI - PMC - PubMed

Substances

LinkOut - more resources