Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006;34(16):4342-53.
doi: 10.1093/nar/gkl440. Epub 2006 Aug 26.

Data mining for proteins characteristic of clades

Affiliations

Data mining for proteins characteristic of clades

Marshall Bern et al. Nucleic Acids Res. 2006.

Abstract

A synapomorphy is a phylogenetic character that provides evidence of shared descent. Ideally a synapomorphy is ubiquitous within the clade of related organisms and nonexistent outside the clade, implying that it arose after divergence from other extant species and before the last common ancestor of the clade. With the recent proliferation of genetic sequence data, molecular synapomorphies have assumed great importance, yet there is no convenient means to search for them over entire genomes. We have developed a new program called Conserv, which can rapidly assemble orthologous sequences and rank them by various metrics, such as degree of conservation or divergence from out-group orthologs. We have used Conserv to conduct a largescale search for molecular synapomorphies for bacterial clades. The search discovered sequences unique to clades, such as Actinobacteria, Firmicutes and gamma-Proteobacteria, and shed light on several open questions, such as whether Symbiobacterium thermophilum belongs with Actinobacteria or Firmicutes. We conclude that Conserv can quickly marshall evidence relevant to evolutionary questions that would be much harder to assemble with other tools.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A short, ancient, internal branch such as the one marked * is not easy to resolve with clock-like sequence evolution, but a rare event such as a signature gene may resolve the clade. The depicted phylogenetic tree is a consensus of those given in three recent studies (36,37,42). The enigmatic organisms and the placements considered here are shown with wiggly branches, with the solid wiggly lines indicating the placements best supported by our synapomorphy search.
Figure 2
Figure 2
This MUSCLE (21) alignment of a 125-aa subsequence of succinate dehydrogenase, flavoprotein subunit (COG1053), provides evidence that Pirellula and Chlamydiales, the first five rows, form a clade. The first five rows share overall sequence similarity, along with the qvr insertion in columns 9–11 and the lr insertion in columns 114–115. The alignment also uncovers an apparent horizontal transfer from Chlorobi/Bacteroides to Geobacter, evidenced by great sequence similarity and the insertion at columns 114–119. The top and bottom rows give the consensus sequence of Chlamydiales and Pirellula. A dot indicates agreement with a consensus residue, and a blank indicates agreement with a consensus gap. The columns with indels are marked by *.
Figure 3
Figure 3
This MUSCLE alignment of a 129-aa subsequence of IMP dehydrogenase (COG0516) provides evidence that Chloroflexi (represented by Dehalococcoides) and Cyanobacteria, the first eight rows, form a clade. The first eight rows share overall sequence similarity, along with an insertion in columns 60–65 and a deletion in columns 111–120. The top and bottom rows give the consensus sequence of the first eight rows. In this figure, | indicates a column with complete conservation over all rows, and : indicates nearly complete conservation.
Figure 4
Figure 4
This MUSCLE alignment of a highly conserved 60-aa sequence in ATP synthase, subunit C, gives a motif synapomorphy arguing for the inclusion of the endosymbionts Buchnera and Wigglesworthia with Enterobacteria. Notice the complete conservation in Enterobacteria of the columns marked *. The first nine organisms are Enterobacteria, the next two (Haemophilus and Pasteurella) are Pasteurellales and the last nine are other γ-Proteobacteria, roughly in order of distance from Enterobacteria.
Figure 5
Figure 5
The plot shows conservation level, measured by quartile similarity score, as a function of window size for the seven most conserved proteins over a sample of 28 diverse eubacteria, representing 11 phyla. The similarity score per residue generally decreases, but sometimes increases when a less-conserved stretch is flanked by two well-conserved subsequences. Because EF-Tu claims top rank at a variety of window lengths, it is the most natural candidate for the most conserved protein over all Bacteria.

References

    1. Rokas A., Holland P.W.H. Rare genomic changes as a tool for phylogenetics. Trends Ecol. Evol. 2000;15:454–459. - PubMed
    1. Müller W.E., Schröder H.C., Skorokhod A., Bünz C., Müller I.M., Grebenjuk V.A. Contribution of sponge genes to unravel the genome of the hypothetical ancestor of Metazoa (Urmetazoa) Gene. 2001;276:161–173. - PubMed
    1. Pasquinelli A.E., McCoy A., Jimenez E., Salo E., Ruvkun G., Martindale M.Q., Baguna J. Expression of the 22 nucleotide let-7 heterochronic RNA throughout the Metazoa: a role in life history evolution? Evol. Dev. 2003;5:372–378. - PubMed
    1. Bruce A.E., Shankland M. Expression of the head gene Lox22-Otx in the leech Helobdella and the origin of the bilaterian body plan. Dev. Biol. 1998;201:101–112. - PubMed
    1. Mirkin B.G., Fenner T.I., Galperin M.Y., Koonin E.V. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 2003;3:2. - PMC - PubMed

Publication types

Substances