Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 12;364(1527):2197-207.
doi: 10.1098/rstb.2009.0034.

The primary divisions of life: a phylogenomic approach employing composition-heterogeneous methods

Affiliations

The primary divisions of life: a phylogenomic approach employing composition-heterogeneous methods

Peter G Foster et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

The three-domains tree, which depicts eukaryotes and archaebacteria as monophyletic sister groups, is the dominant model for early eukaryotic evolution. By contrast, the 'eocyte hypothesis', where eukaryotes are proposed to have originated from within the archaebacteria as sister to the Crenarchaeota (also called the eocytes), has been largely neglected in the literature. We have investigated support for these two competing hypotheses from molecular sequence data using methods that attempt to accommodate the across-site compositional heterogeneity and across-tree compositional and rate matrix heterogeneity that are manifest features of these data. When ribosomal RNA genes were analysed using standard methods that do not adequately model these kinds of heterogeneity, the three-domains tree was supported. However, this support was eroded or lost when composition-heterogeneous models were used, with concomitant increase in support for the eocyte tree for eukaryotic origins. Analysis of combined amino acid sequences from 41 protein-coding genes supported the eocyte tree, whether or not composition-heterogeneous models were used. The possible effects of substitutional saturation of our data were examined using simulation; these results suggested that saturation is delayed by among-site rate variation in the sequences, and that phylogenetic signal for ancient relationships is plausibly present in these data.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Two views of the tree of life. The root of the tree is often considered to be on the branch leading to the eubacteria (e.g. Baldauf et al. 1996) or within the eubacteria (Cavalier-Smith 2006; Skophammer et al. 2007). Under any of those rootings, the three-domains tree has a monophyletic archaebacteria, where Euryarchaeota group with the Crenarchaeota/eocytes. By contrast, the eocyte hypothesis groups the eukaryotes with eocytes making the archaebacteria paraphyletic.
Figure 2.
Figure 2.
Bayesian analysis of combined SSU and LSU rRNA genes. In panel (a) the model is GTR + Γ, separate in the two data partitions, with free partition rate. This is the analysis in table 2, row D. Panel (b) shows an analysis with the NDCH(4,4) NDRH(2,2) tree heterogeneous model from table 2, row J.
Figure 3.
Figure 3.
Assessment of model composition fit by posterior predictive simulation for the rRNA analysis. The test quantity was X2 (sensu Sokal & Rohlf 1981), the statistic used in χ2 tests. Data sets were simulated based on samples from the posterior distribution and for each the X2 was calculated. Black bars show the distribution for a tree-homogeneous model, and white bars show the distribution for a tree-heterogeneous NDCH model with two composition vectors on each data partition. Panel (a) shows distributions for the SSU partition and panel (b) shows distributions for the LSU partition. Arrows show the X2 for the original data, showing that by this test two composition vectors for each data partition are needed to adequately model the data.
Figure 4.
Figure 4.
Bayesian phylogenetic analyses of concatenated amino acid data. The analysis in panel (a) used Dayhoff-recoded data with a GTR + Γ + NDCH(14) tree-heterogeneous substitution model in p4. This is the analysis summarized in table 3, row I. The analysis shown in panel (b) used standard amino acid-coded data with a CAT-Poisson + Γ substitiution model in Phylobayes. This is the analysis summarized in table 3, row F.
Figure 5.
Figure 5.
Saturation plots from simulated data. Simulation distances are simulation branch lengths, measured in average mutations per site. Panel (a) is from DNA simulated under the Jukes-Cantor model. Panel (b) is from protein simulated under the WAG model. The simulations in panels (a) and (b) were performed without among-site rate variation. Panel (c) shows protein simulations under the WAG + Γ model, i.e. including among-site rate variation. Panel (d) shows DNA simulations based on samples from the posterior distribution of the analysis shown in row I of table 2. Lines were fit from the second half of the points.
Figure 6.
Figure 6.
Saturation plots of empirical data. Inferred distances are patristic distances between taxa pairs following the tree path. Panels (a)–(d) show all points; panels (e)–(h) isolate pairs where one member of the pair is a eukaryote sequence and the other is an archaebacterial sequence. Panels (a) and (e): rRNA data with the tree-heterogeneous model shown in row I in table 2. Panels (b) and (f): protein sequences analysed with the WAG + Γ model. Panels (c) and (g): protein sequences analysed with the CAT model. Panels (d) and (h): protein sequences recoded into the six Dayhoff groups and analysed with a GTR + Γ-like model.

References

    1. Baldauf S. L., Palmer J. D., Doolittle W. F.1996The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc. Natl Acad. Sci. USA 93, 7749–7754 (doi:10.1073/pnas.93.15.7749) - DOI - PMC - PubMed
    1. Barns S. M., Delwiche C. F., Palmer J. D., Pace N. R.1996Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences. Proc. Natl Acad. Sci. USA 93, 9188–9193 (doi:10.1073/pnas.93.17.9188) - DOI - PMC - PubMed
    1. Bollback J. P.2002Bayesian model adequacy and choice in phylogenetics. Mol. Biol. Evol. 19, 1171–1180 - PubMed
    1. Brown J. R., Douady C. J., Italia M. J., Marshall W. E., Stanhope M. J.2001Universal trees based on large combined protein sequence data sets. Nat. Genet. 28, 281–285 (doi:10.1038/90129) - DOI - PubMed
    1. Cavalier-Smith T.2002The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int. J. Syst. Evol. Microbiol. 52, 297–354 - PubMed

Publication types

LinkOut - more resources