Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct;20(10):1335-43.
doi: 10.1101/gr.108795.110. Epub 2010 Aug 6.

Massive turnover of functional sequence in human and other mammalian genomes

Affiliations

Massive turnover of functional sequence in human and other mammalian genomes

Stephen Meader et al. Genome Res. 2010 Oct.

Abstract

Despite the availability of dozens of animal genome sequences, two key questions remain unanswered: First, what fraction of any species' genome confers biological function, and second, are apparent differences in organismal complexity reflected in an objective measure of genomic complexity? Here, we address both questions by applying, across the mammalian phylogeny, an evolutionary model that estimates the amount of functional DNA that is shared between two species' genomes. Our main findings are, first, that as the divergence between mammalian species increases, the predicted amount of pairwise shared functional sequence drops off dramatically. We show by simulations that this is not an artifact of the method, but rather indicates that functional (and mostly noncoding) sequence is turning over at a very high rate. We estimate that between 200 and 300 Mb (∼6.5%-10%) of the human genome is under functional constraint, which includes five to eight times as many constrained noncoding bases than bases that code for protein. In contrast, in D. melanogaster we estimate only 56-66 Mb to be constrained, implying a ratio of noncoding to coding constrained bases of about 2. This suggests that, rather than genome size or protein-coding gene complement, it is the number of functional bases that might best mirror our naïve preconceptions of organismal complexity.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Representative genomic distributions of IGS lengths in mouse–rat alignments. Frequencies of IGS (blue) are shown on a log10 scale for AR regions (A) and whole-genome sequences (B) with G+C contents of 0.415–0.425. The red line represents the prediction of the neutral indel model, a geometric distribution of IGS lengths calibrated over IGS ∼15–80 bp in length. For mouse–rat AR sequence, the observed data accurately fit the predictions of the neutral indel model, with no deviation from the model apparent within this interval (inset shows residuals, and 95% confidence bounds in black based on a Bernoulli model). For whole-genome alignments, the data fit accurately for IGS 10–100 bp in length. Beyond 100 bp, there is an excess of longer IGS (green), representing sequence which contains fewer indels than would be predicted under the neutral indel model. The underrepresentation of short IGS (<10) is due to “gap attraction,” an artifact of the alignment process (Lunter et al. 2008). Histograms for the 19 remaining G+C bands are provided as Supplemental Figure 1.
Figure 2.
Figure 2.
Quantities of constrained sequence (gsel) estimated across a range of diverse metazoan species' pairs. Estimates of constrained sequence in eutherian mammalian (red), avian (dark blue), teleost fish (brown), and fruit fly (light blue) species' pairs. For mammalian estimates, a dramatic drop-off in estimates of conservation is associated with increasing divergence between species' pairs, which is not seen in simulations (Fig. 3). The indicative sweep (shaded) suggests that the true quantity of functional material in mammalian genomes may be around 300 Mb (10% of the human genome). The range for human and macaque represents several estimates with varying parameters for the calibration of the neutral model. Consequently, these values may underestimate the true level of constraint. Our highest estimate of conserved sequence in mammals is between mouse and rat, for which we estimate 189.0–258.4 Mb of functional sequence.
Figure 3.
Figure 3.
Estimates of constrained sequence (black bars) in simulated genomes. Simulated genomes contained 5% of constrained sequence (broken line). Constrained sequence rejects 90% of indel events. The neutral indel model consistently underestimates the true quantity of conserved sequence for genome pairs with more than one substitution per neutral base. Only at a divergence of 0.1 does the upper-bound estimate approach the true quantity of constrained sequence. Over divergences of 0.15 to 0.65, the reduction in estimates of constraint is minimal. This is in contrast to observations in alignments of real mammalian genome assemblies, for which there is a 2.2-fold difference over the same evolutionary range (Fig. 2).
Figure 4.
Figure 4.
Representative genomic distribution of IGS lengths in D. melanogaster and D. simulans alignments Frequencies of IGS (blue) lengths shown on a log10 scale for AR regions (A) and whole-genome sequences (B) with a G+C content of 0.495–0.445. The predictions of the neutral indel model are shown in red. In contrast to AR sequence for mouse and rat (Fig. 1), a relatively large proportion (20%–23%) of the small number of ancient fruit fly transposons appear to be under constraint, although the absolute quantity of sequence remains low (0.29–0.32 Mb). Similarly, for whole-genome sequence, we estimate that 55.5–66.2 Mb (46%–55%) of the genome is subject to constraint regarding indels. The difference in the predictions of the neutral indel model for whole-genome and AR sequence indicates that functional sequence may contribute to Drosophila short IGS.

Similar articles

Cited by

References

    1. Ahituv N, Zhu Y, Visel A, Holt A, Afzal V, Pennacchio LA, Rubin EM 2007. Deletion of ultraconserved elements yields viable mice. PLoS Biol 5: e234 doi: 10.1371/journal.pbio.0050234 - PMC - PubMed
    1. Andolfatto P 2005. Adaptive evolution of non-coding DNA in Drosophila. Nature 437: 1149–1152 - PubMed
    1. Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S 2007. Analysis of sequence conservation at nucleotide resolution. PLoS Comput Biol 3: e254 doi: 10.1371/journal.pcbi.0030254 - PMC - PubMed
    1. Batzer MA, Deininger PL 2002. Alu repeats and human genomic diversity. Nat Rev Genet 3: 370–379 - PubMed
    1. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D 2004. Ultraconserved elements in the human genome. Science 304: 1321–1325 - PubMed

Publication types

LinkOut - more resources