Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2014 Nov 6;10(11):e1003925.
doi: 10.1371/journal.pcbi.1003925. eCollection 2014 Nov.

A comparative study and a phylogenetic exploration of the compositional architectures of mammalian nuclear genomes

Affiliations
Comparative Study

A comparative study and a phylogenetic exploration of the compositional architectures of mammalian nuclear genomes

Eran Elhaik et al. PLoS Comput Biol. .

Abstract

For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the "isochore theory," which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the "murid shift," and in many ways resembles the genome of opossum. We find no support to the "isochore theory." Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Phylogenetic trees illustrating two competing hypotheses concerning the relative kinships of murids, laurasiatherians, and primates to one another.
Figure 2
Figure 2. Pairwise comparisons of domain-length distributions for five taxa.
Homogeneous-domain lengths are shown above the diagonal; nonhomogeneous-domain lengths are below, where the distribution curves of the species on the X-axis are solid and those on the Y-axis are dashed. On the diagonal we compare homogeneous and nonhomogeneous domain length distributions within a taxon. The first value in each plot is the p-value of significant (Kolmogorov-Smirnov goodness-of-fit test) and the colors represent the actual p-value after correcting for multiple testing using the FDR method (black>0.05 and pink<0.05). The second and third values are effect size calculated as the nonoverlapping percentage of the two distributions and Cohen's d using the Hedges' g estimator, respectively.
Figure 3
Figure 3. The cumulative distribution of homogeneous domain lengths in log scale.
For simplicity, the mean distributions of primates, murids, and laurasiatherians are shown. In the inset, the majority of the domains of medium-short length.
Figure 4
Figure 4. Compositional domain densities of all chromosomes.
Box plots summarize medians, quartiles, and range.
Figure 5
Figure 5. Genomic coverage of four compositional domain types.
Homogeneous domains are in blue shades; nonhomogeneous domains are in green shades. Domains longer than 300 kb are in dark shades; domains shorter than 300 kb are in light shades. Compositionally homogeneous domains longer than 300 kb (i.e., isochoric domains) are in dark blue.
Figure 6
Figure 6. The cumulative density function P(x) of compositional homogeneous domain lengths (x) (points) plotted on a log-log scale.
The dashed lines represent the maximum likelihood power-law fits to the data.
Figure 7
Figure 7. Pairwise comparisons of domain GC content distributions for five taxa.
Homogeneous-domain lengths are shown above the diagonal; nonhomogeneous-domain lengths are below, where the distribution curves of the species on the X-axis are solid and those on the Y-axis are dashed. On the diagonal we compare homogeneous and nonhomogeneous domain GC content distributions within a taxon. The first value in each plot is the p-value of significant (Kolmogorov-Smirnov goodness-of-fit test) and the colors represent the actual p-value after correcting for multiple testing using the FDR method (black>0.05 and pink<0.05). The second and third values are effect size calculated as the nonoverlapping percentage of the two distributions and Cohen's d using the Hedges' g estimator, respectively.
Figure 8
Figure 8. A two dimensional joint distribution of homogeneous domain GC content and its standard deviation (GCσ).
Each domain GC content and GCσ are represented by a point on the map. The frequency of different points is represented by colors ranging from red (highest frequency) to blue (lowest frequency). The mean GC content of the mammalian genome is marked by horizontal line.
Figure 9
Figure 9. A two dimensional joint distribution of homogeneous domain GC content and length in a log scale.
Each domain's GC content and length are represented by a point in the map. The frequency of different points is represented by colors ranging from red (highest frequency) to blue (lowest frequency). The mean GC content of the mammalian genome is marked by horizontal line.

References

    1. Elsik CG, Tellam RL, Worley KC, Gibbs RA, Muzny DM, et al. (2009) The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science 324: 522–528. - PMC - PubMed
    1. Elhaik E, Graur D, Josić K (2010) Comparative testing of DNA segmentation algorithms using benchmark simulations. Mol Biol Evol 27: 1015–1024. - PubMed
    1. Macaya G, Thiery JP, Bernardi G (1976) An approach to the organization of eukaryotic genomes at a macromolecular level. J Mol Biol 108: 237–254. - PubMed
    1. Thiery JP, Macaya G, Bernardi G (1976) An analysis of eukaryotic genomes by density gradient centrifugation. J Mol Biol 108: 219–235. - PubMed
    1. Cuny G, Soriano P, Macaya G, Bernardi G (1981) The major components of the mouse and human genomes: Preparation, basic properties and compositional heterogeneity. Eur J Biochem 115: 227–233. - PubMed

Publication types