Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep;26(9):2087-95.
doi: 10.1093/molbev/msp123. Epub 2009 Jun 30.

Streamlining and large ancestral genomes in Archaea inferred with a phylogenetic birth-and-death model

Affiliations

Streamlining and large ancestral genomes in Archaea inferred with a phylogenetic birth-and-death model

Miklós Csurös et al. Mol Biol Evol. 2009 Sep.

Abstract

Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.—
FIG. 1.—
Consensus evolutionary tree of Archaea in the study. The consensus is based on maximum likelihood trees for concatenated alignments of ribosomal and unique conserved proteins. Branch lengths are set by maximum likelihood for the r-proteins. Recognized archaeal orders are highlighted. The boxed triples on the left show the percentage of bootstrap samples supporting the particular edges in three data sets (from 500 replicates for each set): r-proteins, uc-proteins, and uc-proteins without C. symbiosum(Censy) and N. equitans(Naneq). All other edges have >97% bootstrap support in all data sets. Numbers next to the terminal taxa denote genome size in million base pairs.
F<sc>IG</sc>. 2.—
FIG. 2.—
Branch-specific loss rates formula imagecompared with expected numbers of substitutions (or edge length) for each branch e. Pairs of sibling terminal taxa are connected by lines.
F<sc>IG</sc>. 3.—
FIG. 3.—
A digest of gene content evolution in Archaea. The bar graphs plot posterior means for number of families. The chart on the left shows the number of families with at least one homolog; the fatter part of the bar is proportional to the number of multigene families. The chart in the middle plots the families acquired and lost on the branch leading to the indicated node. The net change is highlighted by the solid part of the bars. The chart on the right shows how many families underwent a contraction from multigene to single-gene composition, or expanded from a single homolog to multiple paralogs. For instance, the common ancestor of Methanococcales is inferred to have had 1723 gene families, out of which 156 were gained after the split with Methanobacteriales. During the same time, 586 families present at the common ancestor M1 were lost, and the solid bar indicates the net loss of 430 ( = 586-156) families. Among multimember families retained from M1, 68 contracted to a single homolog, and 13 single-member families expanded. Note that scaling is the same on the left-hand side and in the middle,but different on the right-hand side.

References

    1. Alexeyenko A, Tamas I, Liu G, Sonnhammer ELL. Automatic clustering of orthologs and inparalogs shared by multiple genomes. Bioinformatics. 2006;22:e9–e15. - PubMed
    1. Baliga NS, Bonneau R, Facciotti MT, et al. 15 co-authors. Genome sequence of Haloarcula morismurtuimi: a halophilic archaeon from the Dead Sea. Genome Res. 2004;14:2221–2234. - PMC - PubMed
    1. Bapteste É, Brochier C, Boucher Y. Higher-level classification of the Archaea: evolution of methanogenesis and methanogens. Archaea. 2005;1:353–363. - PMC - PubMed
    1. Boucher Y, Douady CJ, Papke RT, Walsh DA, Boudreau MER, Nesbo CL, Case RJ, Doolittle WF. Lateral gene transfer and the origin of prokaryotic groups. Ann Rev Genet. 2003;37:283–328. - PubMed
    1. Brochier C, Forterre P, Gribaldo S. Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus paradox. Genome Biol. 2004;5:R17. - PMC - PubMed

Publication types

LinkOut - more resources