Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb;196(2):523-38.
doi: 10.1534/genetics.113.158147. Epub 2013 Dec 13.

An age-of-allele test of neutrality for transposable element insertions

Affiliations

An age-of-allele test of neutrality for transposable element insertions

Justin P Blumenstiel et al. Genetics. 2014 Feb.

Abstract

How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, previous studies have used models of transposition-selection equilibrium that assume a constant rate of transposition. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable. Here we propose a test of neutrality for TE insertions that does not rely on the assumption of a constant transposition rate. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence. By conditioning on the age of an individual TE insertion allele (inferred by the number of unique substitutions that have occurred within the particular TE sequence since insertion), we determine the probability distribution of the insertion allele frequency in a population sample under neutrality. Taking models of varying population size into account, we then evaluate predictions of our model against allele frequency data from 190 retrotransposon insertions sampled from North American and African populations of Drosophila melanogaster. Using this nonequilibrium neutral model, we are able to explain ∼ 80% of the variance in TE insertion allele frequencies based on age alone. Controlling for both nonequilibrium dynamics of transposition and host demography, we provide evidence for negative selection acting against most TEs as well as for positive selection acting on a small subset of TEs. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs, gene duplications, or other copy number variants.

Keywords: Drosophila melanogaster; genome evolution; population genomics; test of neutrality; transposable elements (TEs).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Method for estimating TE insertion age based on unique substitution counts from insertions gathered from a single reference. (A) (i) Schematic of evolutionary dynamics for two active sublineages of the same TE family, depicting recent transposition events (arrows) leading to new TE insertions (rectangles) and postinsertion mutation events (solid tick marks inside rectangles). Each horizontal line represents a single chromosomal segment in a population sample. Dashed lines indicate where segments lack a TE sequence relative to the reference genome. TEs located above segments are insertions not present in the reference genome. In this example, TE insertion a has recently integrated, is at low frequency in the population sample, and has accumulated no unique mutations. In contrast, TE insertions b, c, and d represent older insertions that are at higher frequency in the population that have accumulated unique mutations. (ii) Schematic depicting the procedure used to estimate the age of TE insertions identified in the reference genome. A multiple alignment of all paralogous copies of the TE family from the reference is generated. Variant sites are identified and classified as being shared or unique, with only the number of substitutions unique to each reference insertion, s, being used to estimate the time since insertion. Shared substitutions are inferred to arise on active lineages and excluded from the estimate of allele age. Our model contrasts age based on s with TE insertion allele frequency in the population, i. Older reference insertions with higher s are expected to have a greater frequency i under neutrality. (B) Schematic of coalescent process for a TE insertion that is ascertained from a reference genome sequence. Frequency in the sample is a function of the number of descendants from a single ancestor that received the insertion at time t and gave rise to the reference insertion allele. In this example, insertion c from A inserted at the time at which the n = 7 sample alleles have j = 3 ancestors. All descendants from the insertion contain the insertion allele (i = 3). Since the time of insertion, s = 2 unique substitutions have accumulated on the reference insertion. It is only these unique substitutions leading to the reference allele that are used to estimate the age of the TE insertion. Other mutations arise independently on nonreference insertion alleles, which could in principle be used to estimate the time to the most recent common ancestor (TMRCA) of the insertions allele, but are not used here.
Figure 2
Figure 2
(A–D) Probability for i, number of insertion copies in the sample, under model predictions and simulations. t indicates known time since insertion. Selection was simulated only for the case where t = 0.1 (A) because deleterious elements become quickly eliminated from the population at later times.
Figure 3
Figure 3
Distribution of P-values for observing as many or fewer insertion alleles, for 190 simulated insertion alleles, where ages of each TE are estimated using the model from a Poisson-simulated number of substitutions. Median P-value is indicated with a thick line, upper and lower quartiles are indicated with a box, range is shown with whiskers, and outliers are shown with circles. (A) Effects of time since insertion, t, on model-based inference. A constant population size of Ne = 1000 was simulated with varying time of insertion = t. Inference under the model used constant Ne. (B) Effects of varying Ne on model-based inference. After a transposition burst, a population of 100 was simulated for 20 generations (t = 0.2) followed by expansion to 1000 individuals for 100 generations (t = 0.1) for a total t = 0.3. Inference under the model was performed in two ways. Under the varying model, the probability of observing as many or fewer alleles was estimated, conditional on the same demographic scenario that was simulated. Under the constant model, the probability of observing as many or fewer alleles was estimated, conditional on a constant (postexpansion) population size of 1000.
Figure 4
Figure 4
Distribution of ages [in s, unique substitutions per base pair (subs/bp)] of the 190 TEs used for this analysis.
Figure 5
Figure 5
(A and B) Observed and expected allele counts under models of varying population size for North American and African populations of D. melanogaster. Alleles are ranked by age and the analysis accounts for age uncertainty and ascertainment bias. (A) Observed and expected allele counts in the North American sample assuming the demographic scenario of a bottleneck from Africa to Europe followed by a bottleneck from Europe to North America. (B) Observed and expected allele counts for the African demographic scenario of an ancient population expansion. See Materials and Methods for details of demographic scenarios. Between A and B, TEs from low-recombination-rate regions and non-LTR families are indicated.
Figure 6
Figure 6
(A) Observed and expected allele counts assuming a constant population size for a North American population of D. melanogaster. In A and B, alleles are ranked by age, the analysis accounts for age uncertainty and ascertainment bias, and observed counts are also adjusted for admixture. (B) Probability of observing as many or fewer copies in the sample for each TE.

References

    1. Aminetzach Y. T., Macpherson J. M., Petrov D. A., 2005. Pesticide resistance via transposition-mediated adaptive gene truncation in Drosophila. Science 309: 764–767. - PubMed
    1. Bachtrog D., 2003. Accumulation of Spock and Worf, two novel non-LTR retrotransposons, on the neo-Y chromosome of Drosophila miranda. Mol. Biol. Evol. 20: 173–181. - PubMed
    1. Bartolome C., Maside X., 2004. The lack of recombination drives the fixation of transposable elements on the fourth chromosome of Drosophila melanogaster. Genet. Res. 83: 91–100. - PubMed
    1. Bergman C. M., Bensasson D., 2007. Recent LTR retrotransposon insertion contrasts with waves of non-LTR insertion since speciation in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 104: 11340–11345. - PMC - PubMed
    1. Biemont C., Lemeunier F., Guerreiro M. P. G., Brookfield J. F., Gautier C., et al. , 1994. Population dynamics of the copia, mdg1, mdg3, gypsy, and P transposable elements in a natural population of Drosophila melanogaster. Genet. Res. 63: 197–212. - PubMed

Publication types

Substances

LinkOut - more resources