Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 2;8(8):2597-612.
doi: 10.1093/gbe/evw181.

Predictive Models of Recombination Rate Variation across the Drosophila melanogaster Genome

Affiliations

Predictive Models of Recombination Rate Variation across the Drosophila melanogaster Genome

Andrew B Adrian et al. Genome Biol Evol. .

Abstract

In all eukaryotic species examined, meiotic recombination, and crossovers in particular, occur non-randomly along chromosomes. The cause for this non-random distribution remains poorly understood but some specific DNA sequence motifs have been shown to be enriched near crossover hotspots in a number of species. We present analyses using machine learning algorithms to investigate whether DNA motif distribution across the genome can be used to predict crossover variation in Drosophila melanogaster, a species without hotspots. Our study exposes a combinatorial non-linear influence of motif presence able to account for a significant fraction of the genome-wide variation in crossover rates at all genomic scales investigated, from 20% at 5-kb to almost 70% at 2,500-kb scale. The models are particularly predictive for regions with the highest and lowest crossover rates and remain highly informative after removing sub-telomeric and -centromeric regions known to have strongly reduced crossover rates. Transcriptional activity during early meiosis and differences in motif use between autosomes and the X chromosome add to the predictive power of the models. Moreover, we show that population-specific differences in crossover rates can be partly explained by differences in motif presence. Our results suggest that crossover distribution in Drosophila is influenced by both meiosis-specific chromatin dynamics and very local constitutive open chromatin associated with DNA motifs that prevent nucleosome stabilization. These findings provide new information on the genetic factors influencing variation in recombination rates and a baseline to study epigenetic mechanisms responsible for plastic recombination as response to different biotic and abiotic conditions and stresses.

Keywords: DNA motif analysis; double strand break; machine-learning algorithms; recombination.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.—
Fig. 1.—
Genomic landscape of motifs. Number of motifs per 100-kb for motif 3 (M3 in blue; [A]n) and motif 4 (M4 in red; [CA]n) across autosomal arms 2L, 2R, 3L, 3R and the X chromosome (see supplementary fig. S1 for motif sequence information). Presence shown after applying a 1% FDR (see “Methods” section).
<sc>Fig</sc>. 2.—
Fig. 2.—
Probability heatmap of correlation between presence of individual motifs and crossover rates ρLD. P-values of non-parametric correlation (Spearman’s rs) of motif presence and LD-based crossover rates ρLD calculated genome-wide and for each chromosome arm. Only motifs showing correlations with P < 0.01 in genome-wide analyses are shown. Motifs are ordered based on Spearman’s correlation across the genome, with M5 (Spearman’s rs2= 0.237, P = 1 × 10−71) showing the strongest correlation.
<sc>Fig</sc>. 3.—
Fig. 3.—
Random Forest (RF) models. A) Accuracy (true positive rate) is given for 10 crossover classes, from class A (regions with lowest 10% crossover rate ρLD) to class J (regions with highest 10% crossover rate ρLD). Random accuracy (uninformative model) per class is 10% (horizontal dashed line). The model tested utilizes all 12 motifs to predict crossover classes (see “Methods” section for details). B) Accuracy when the model is trained with data from one autosomal arm and applied to either other autosomal arms or to the X chromosome (left) as testing set, or trained with data from the X chromosome and applied to either autosomal arms or to the X chromosome (right).
<sc>Fig</sc>. 4.—
Fig. 4.—
MARS predictive models of crossover rates. (A) Estimates of the predictive power of MARS models (R2GCV) based on the presence of motifs across the genome, transcription data during early meiosis, and/or chromosome arms as predictive variables. Results shown for genome-wide analyses (red) and for trimmed (after removing sub-telomeric and -centromeric regions) genome (blue). (B) Relationship between crossover rates obtained from population genetic analyses of linkage disequilibrium (x-axis) and those predicted based on a MARS model (y-axis) including motif, transcription and chromosome arm data. The unit of crossover rate is ρLDLD = 2 Ne r), where Ne is the effective population size and r is the rate of crossover per bp and generation in females (Chan et al. 2012). C) Examples of the linear and non-linear influence of motif presence on crossover rates. All results are shown for analyses of the RG population at 100-kb scale.
<sc>Fig</sc>. 5.—
Fig. 5.—
Influence of genomic scale on the correlation between motif presence and crossover rates ρLD. Spearman’s non-parametric correlation (rs) between motif presence and LD-based crossover rates ρLD is shown for intervals of 5-, 10-, 25-, 50-, 100-, 250-, 500-, 1,000-, and 2,500-kb. For each motif, the nine adjacent vertical bars indicate the different scales, from the finest scale (5-kb; left) to the broadest scale (2,500-kb; right). The color of each bar indicates the probability (P) of the correlation, with more significant correlations (lower probabilities) in darker red. Vertical bars with dashed borders indicate correlations with probabilities greater than 0.001. See supplementary figure S1 for motif sequence information.
<sc>Fig</sc>. 6.—
Fig. 6.—
Influence of genomic scale on RF and MARS analyses. Predictive power (accuracy and R2CV in RF and MARS models, respectively) for models using motif presence to predict crossover rates ρLD at nine different genomic scales across the genome.
<sc>Fig</sc>. 7.—
Fig. 7.—
Influence of intra-specific origin of motif data and genomic scale on MARS analyses. Predictive power (R2CV) for MARS models of crossover rate ρLD across sequences of the RG population using motif distribution estimated from sequences of the same RG population, the ZI population or the D. melanogaster reference genome.
<sc>Fig</sc>. 8.—
Fig. 8.—
Boxplots of local crossover rate (ρLD) at motifs. Crossover rates estimated by LDhelmet (Chan et al. 2012) between SNPs surrounding the motif location in the RG population. The average distance between SNPs surrounding motifs in the RG population is 41-bp. Median crossover rates are identified by the horizontal line inside each box and the length of the box and whiskers indicate 50% and 90% CI, respectively. The horizontal dashed line indicates the genome-wide median ρLDLD= 0.0225). Asterisks below boxes indicate the probability of having crossover rates compatible with genome-wide estimates. For analyses of local crossover rates at accessible chromatin regions, ρLD was estimated at the center of the region. The study of all accessible chromatin regions (see “Methods” section) reveals median ρLD of 0.0234 at these regions (indistinguishable from genome-wide rates, P > 0.50). Asterisks above boxes of accessible chromatin regions containing specific motifs ([CA]n, short poly-A and/or [TA]n) indicate the probability of having crossover rates compatible with all accessible chromatin regions.

References

    1. Adrian AB, Comeron JM. 2013. The Drosophila early ovarian transcriptome provides insight to the molecular causes of recombination rate variation across genomes. BMC Genomics 14:794.. - PMC - PubMed
    1. Aymard F, et al. 2014. Transcriptionally active chromatin recruits homologous recombination at DNA double-strand breaks. Nat Struct Mol Biol. 21:366–374. - PMC - PubMed
    1. Bailey TL, et al. 2009. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37:W202–W208. - PMC - PubMed
    1. Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP. 2007. A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29:173–180. - PubMed
    1. Baudat F, et al. 2010. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science 327:836–840. - PMC - PubMed

Publication types

LinkOut - more resources