Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 1998 Dec;62(4):1435-91.
doi: 10.1128/MMBR.62.4.1435-1491.1998.

Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes

Affiliations
Review

Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes

R S Gupta. Microbiol Mol Biol Rev. 1998 Dec.

Abstract

The presence of shared conserved insertion or deletions (indels) in protein sequences is a special type of signature sequence that shows considerable promise for phylogenetic inference. An alternative model of microbial evolution based on the use of indels of conserved proteins and the morphological features of prokaryotic organisms is proposed. In this model, extant archaebacteria and gram-positive bacteria, which have a simple, single-layered cell wall structure, are termed monoderm prokaryotes. They are believed to be descended from the most primitive organisms. Evidence from indels supports the view that the archaebacteria probably evolved from gram-positive bacteria, and I suggest that this evolution occurred in response to antibiotic selection pressures. Evidence is presented that diderm prokaryotes (i.e., gram-negative bacteria), which have a bilayered cell wall, are derived from monoderm prokaryotes. Signature sequences in different proteins provide a means to define a number of different taxa within prokaryotes (namely, low G+C and high G+C gram-positive, Deinococcus-Thermus, cyanobacteria, chlamydia-cytophaga related, and two different groups of Proteobacteria) and to indicate how they evolved from a common ancestor. Based on phylogenetic information from indels in different protein sequences, it is hypothesized that all eukaryotes, including amitochondriate and aplastidic organisms, received major gene contributions from both an archaebacterium and a gram-negative eubacterium. In this model, the ancestral eukaryotic cell is a chimera that resulted from a unique fusion event between the two separate groups of prokaryotes followed by integration of their genomes.

PubMed Disclaimer

Figures

FIG. 1
FIG. 1
Evolutionary relationships among living organisms in the three-domain model of Woese et al. (258) (a) and as suggested here based on protein sequence data and structural characteristics of organisms (b). In panel b, the solid arrows identify taxa that evolved from each other in the directions shown by accumulation of mutations and the dotted lines denote symbiotic events that led to the acquisition of mitochondria and plastids. These latter events, which are common in both models, are not shown in panel a. In panel b, the double-headed arrow between archaebacteria and gram-positive bacteria indicates the polyphyletic relationship between these groups for several genes. The terms “monoderm” and “diderm” refer to prokaryotic cells that are bounded by only one membrane or two different (cytoplasmic and outer) membranes, respectively. The dashed lines indicate the first fusion between an archaebacterium and a gram-negative bacterium that is postulated to have given rise to the ancestral eukaryotic cell (102, 105). Abbreviations: CM, cytoplasmic membrane; CW, cell wall; OM, outer membrane, PE, periplasm.
FIG. 2
FIG. 2
Alignment of representative Hsp70 sequences from archaebacteria (A), gram-positive bacteria (G+), gram-negative bacteria (G), eukaryotic-organellar (O), and eukaryotic nuclear-cytosolic (E) homologs. Small regions from the N- and C-terminal ends, which are not properly aligned in the global alignment of sequences and hence are not included in phylogenetic analyses, are not shown. The dashes indicate identity to the residue in the top line. The accession numbers of the sequences are shown. The boxed region shows the large insert in the N-terminal region present in all gram-negative bacteria and eukaryotic homologs. The solid lines above the sequence alignment identify several highly conserved regions that have proven useful to design degenerate primers for cloning purposes (57, 76, 102, 103, 107). The numbers at the beginning and at the end of the alignment denote the positions of the first and last amino acids included in individual protein sequences. The sequences were aligned by using the CLUSTAL program from PC Gene software package (IntelliGenetics), and minor changes were made to correct any visible misalignments. The abbreviations (m) and (chl) identify mitochondria and chloroplasts.
FIG. 2
FIG. 2
Alignment of representative Hsp70 sequences from archaebacteria (A), gram-positive bacteria (G+), gram-negative bacteria (G), eukaryotic-organellar (O), and eukaryotic nuclear-cytosolic (E) homologs. Small regions from the N- and C-terminal ends, which are not properly aligned in the global alignment of sequences and hence are not included in phylogenetic analyses, are not shown. The dashes indicate identity to the residue in the top line. The accession numbers of the sequences are shown. The boxed region shows the large insert in the N-terminal region present in all gram-negative bacteria and eukaryotic homologs. The solid lines above the sequence alignment identify several highly conserved regions that have proven useful to design degenerate primers for cloning purposes (57, 76, 102, 103, 107). The numbers at the beginning and at the end of the alignment denote the positions of the first and last amino acids included in individual protein sequences. The sequences were aligned by using the CLUSTAL program from PC Gene software package (IntelliGenetics), and minor changes were made to correct any visible misalignments. The abbreviations (m) and (chl) identify mitochondria and chloroplasts.
FIG. 3
FIG. 3
Signature sequence in Hsp70 proteins showing a specific relationship between archaebacteria (A) and gram-positive bacteria (G+) (both monoderm prokaryotes) and the distinctness of gram-negative bacteria (G) (diderm prokaryotes). The large indel common in all gram-negative bacteria (referred to as the diderm insert) but absent in all monoderm prokaryotes is boxed. In the top diagram, 〉 denotes the root of the prokaryotic tree as inferred in the text. The thick arrow indicates the probable stage where this signature was introduced. The dashes in all sequence alignments show identity to the amino acids in the top line.
FIG. 4
FIG. 4
A rooted neighbor-joining tree of prokaryotic organisms based on EF-1α/Tu sequences. The tree was rooted by using aligned EF-2/G sequences, which are derived from an ancient gene duplication in the common ancestor of prokaryotes (126). The tree shown is a neighbor-joining consensus tree obtained after 100 bootstraps. The bootstrap scores for various nodes are shown. The tree reveals that the root of the prokaryotes lies in between two groups of monoderm prokaryotes. A, G+, and G refer to archaebacteria, gram-positive bacteria, and gram-negative bacteria, respectively.
FIG. 5
FIG. 5
Alignment of Hsp70 and MreB sequences from different groups of species showing the absence of the diderm insert in the MreB sequences. The absence of the insert in all MreB proteins, as well as Hsp70 homologs from archaebacteria and gram-positive bacteria (boxed region), provides evidence that the homologs lacking the insert are ancestral. (104, 107). The numbers at the beginning and at the end indicate the position of the sequence in individual proteins.
FIG. 6
FIG. 6
Time line showing some of the main events in the history of this planet based on geological and fossil evidence (132, 141, 208, 209).
FIG. 7
FIG. 7
Excerpts from EF-1α/Tu (a), ribosomal protein L5 (b), ribosomal protein S5 (c), and ribosomal protein L14 (d) alignments identifying signature sequences that show the distinctness of archaebacteria (A) from eubacteria (G+ and G). The common indels that distinguish archaebacteria from eubacteria are boxed. (The relationship of prokaryotes to the eukaryotes [E] is considered in later figures). 〉 denotes the root of the prokaryotic tree as inferred in the text, and the thick arrow indicates the probable stage where these signatures were introduced.
FIG. 7
FIG. 7
Excerpts from EF-1α/Tu (a), ribosomal protein L5 (b), ribosomal protein S5 (c), and ribosomal protein L14 (d) alignments identifying signature sequences that show the distinctness of archaebacteria (A) from eubacteria (G+ and G). The common indels that distinguish archaebacteria from eubacteria are boxed. (The relationship of prokaryotes to the eukaryotes [E] is considered in later figures). 〉 denotes the root of the prokaryotic tree as inferred in the text, and the thick arrow indicates the probable stage where these signatures were introduced.
FIG. 8
FIG. 8
Consensus neighbor-joining tree for prokaryotic organisms based on Hsp70 protein sequences. The tree, which was bootstrapped 100 times, is based on 362 aligned positions for which sequence information from all species are known. Other trees based on larger numbers of aligned characters also show similar results (see Fig. 27) (57, 103, 108). The archaebacterial species (marked with asterisks) show a polyphyletic branching within gram-positive bacteria (both monoderm prokaryotes), which is statistically strongly supported (95, 108). The gram-negative bacteria (diderm prokaryotes) form a distinct clade in 99% of the bootstraps, which is highly significant. The relationships and branching orders of some of the main divisions within eubacteria are indicated.
FIG. 9
FIG. 9
Signature sequence (boxed insert) in GS I (a) and glutamate-1-semialdehyde 2,1-aminomutase (b), showing the relatedness of archaebacterial (A) homologs to gram-positive (G+) bacteria and the distinctness of gram-negative (G) bacteria. The top diagram indicates the suggested interpretation that these signature, as well as the large diderm insert in Hsp70 protein (Fig. 3), were introduced into a common ancestor of G bacteria. (a) G+ (II) identifies sequences from some of the GS II family of proteins (22, 205). E, eukaryotes.
FIG. 10
FIG. 10
Excerpt from the GroEL (or Hsp60) protein sequence alignment showing a 1-aa insertion (boxed) that is shared by most divisions of G bacteria but absent from all G+ bacteria. The absence of this insert in Thermus aquaticus and Deinococcus proteolyticus, which are diderm prokaryotes that contain thick cell walls, indicates that this insert was introduced into an ancestral gram-negative lineage after the branching of the Deinococcus-Thermus group (thick arrow in the top diagram).
FIG. 11
FIG. 11
Evolutionary relationships between eubacterial species and groups based on the GroEL (Hsp60) sequences. The tree shown is a consensus neighbor-joining distance tree obtained after 100 bootstraps. The distinct branching of low-G+C and high-G+C gram-positive bacteria and their close relationship to the cyanobacteria should be noted. The branching order of other prokaryotic groups in the GroEL tree is very similar to that observed for Hsp70 sequences (Fig. 8). Similar results with GroEL/Hsp60 sequences have been reported in other studies (96, 98). Although the tree shown here is unrooted, in other studies (96, 98) where the Hsp60 tree was rooted with the distantly related TCP-1 protein from archaebacteria (243), the low-G+C gram-positive bacteria were found to be the deepest-branching group within eubacteria (marked with ★). Reproduced from reference with permission of the publisher.
FIG. 12
FIG. 12
Evolutionary relationships within prokaryotes as indicated by the monoderm-diderm model (top) versus the currently popular archaebacterial model (bottom). It should be noted that the latter model does not recognize diderm prokaryotes as a distinct taxon and that in phylogenetic trees based on 16S rRNA the gram-positive (monoderms) and gram-negative (diderms) bacteria show polyphyletic branching within each other (183, 250, 251, 258). Abbreviations: CM, cytoplasmic membrane; OM, outer membrane; CW, cell wall. Reproduced from reference with permission of the publisher.
FIG. 13
FIG. 13
Signature sequence in ribosomal S12 protein (a) and dihydroorotate dehydrogenase (b), distinguishing archaebacteria (A) and the low-G+C gram-positive bacteria from the high-G+C gram-positive group (G+) and gram-negative bacteria (G). These signatures (boxed) provide evidence that the gram-negative bacteria are specifically related to the high-G+C gram-positive group. The asterisks in this and all subsequent alignments identify sequences retrieved from the National Center for Biotechnology Information unfinished microbial genomes database.
FIG. 14
FIG. 14
Signature sequence (boxed) in pyruvate kinase which appears specific for the low-G+C gram-positive group. This signature was probably introduced in the branch leading to this particular group.
FIG. 15
FIG. 15
Signature sequence (boxed) in the DNA gyrase A subunit which is specific for the high-G+C gram-positive group. As indicated in the top diagram, this signature was probably introduced in the branch leading to this group.
FIG. 16
FIG. 16
Signature sequences (boxed) in acetolactate synthase (a) and asparginyl-tRNA synthetase (b) showing a grouping of the Deinococcus-Thermus species with archaebacteria and gram-positive bacteria. Similar to the Hsp60 protein (Fig. 10), these signatures were introduced in an ancestral gram-negative lineage after the branching of the Deinococcus-Thermus group.
FIG. 17
FIG. 17
Sequence signatures in FtsZ (a) and glutamate dehydrogenase (b) showing the relatedness of cyanobacteria (and chloroplast homologs) to gram-positive bacteria and archaebacteria. As shown in the diagram above, these signatures (boxed) were probably introduced in a common ancestor of other gram-negative bacteria after the branching of cyanobacteria.
FIG. 18
FIG. 18
Signature sequences in DnaJ (a), EF-Ts protein (b), EF-Tu protein (c), and DNA polymerase I (d) that are unique to only the Deinococcus-Thermus group and cyanobacteria. To explain the presence of these signatures (boxed), as well as those in Fig. 10 and 16, it was suggested that these signatures were introduced initially into the branch leading to cyanobacteria (thick arrow) and then laterally transferred to the Deinococcus-Thermus group (thin dashed arrow). The alternate possibility, that these signatures were first introduced into the branch leading to Deinococcus-Thermus and then transferred to cyanobacteria, is also possible. Panels b through d reproduced from reference with permission of the publisher.
FIG. 18
FIG. 18
Signature sequences in DnaJ (a), EF-Ts protein (b), EF-Tu protein (c), and DNA polymerase I (d) that are unique to only the Deinococcus-Thermus group and cyanobacteria. To explain the presence of these signatures (boxed), as well as those in Fig. 10 and 16, it was suggested that these signatures were introduced initially into the branch leading to cyanobacteria (thick arrow) and then laterally transferred to the Deinococcus-Thermus group (thin dashed arrow). The alternate possibility, that these signatures were first introduced into the branch leading to Deinococcus-Thermus and then transferred to cyanobacteria, is also possible. Panels b through d reproduced from reference with permission of the publisher.
FIG. 19
FIG. 19
Signature sequences (boxed) in Hsp70 (a) and alanyl-tRNA synthetase (b), defining and distinguishing proteobacterial group from all other divisions of prokaryotes.
FIG. 20
FIG. 20
Signature sequences in Hsp70 (a) and DNA gyrase B (b) which appear specific for the beta and gamma subdivisions of proteobacteria. These signature sequences (boxed), in combination with those in Fig. 19, could be used to define and distinguish between proteobacterial subdivisions alpha, delta, and epsilon (proteobacteria-1) and subdivisions beta and gamma (proteobacteria-2).
FIG. 21
FIG. 21
Excerpts from EF-1α/Tu protein sequences showing a conserved insert (originally identified by Rivera and Lake [198]) that is present in various Crenarchaeota archaebacteria (eocytes), as well as eukaryotic homologs, but absent in Euryarchaeota archaebacteria and eubacteria. This insert indicates that of the two archaebacterial groups, Euryarchaeota, are ancestral.
FIG. 22
FIG. 22
Signature sequence in dihydroorotate dehydrogenase showing the relatedness of halophilic archaebacteria to the high-G+C gram-positive bacteria and of the methanogenic and thermoacidophilic archaebacteria to the low-G+C group. The thick arrow indicates that similar to other protein sequences (Fig. 13), this signature provides evidence that gram-negative bacteria are specifically related to the high-G+C gram-positive group. To explain the results with archaebacterial homologs, it is necessary to postulate either that there was a lateral gene transfer from high-G+C gram-positive bacteria to the halophilic archaebacteria or that the two groups of archaebacteria bear specific relationships to the two divisions of gram-positive bacteria (thin solid arrows). The question mark indicates that these results raise questions about the evolutionary relationship between archaebacteria and gram-positive bacteria.
FIG. 23
FIG. 23
Possible scenarios to explain the evolutionary relationship between archaebacteria and gram-positive bacteria. Scenario I assumes the archaebacteria to be monophyletic; to explain various other gene phylogenies where archaebacteria show polyphyletic branching within gram-positive bacteria (e.g., Hsp70, GS 1, GDH, dihydroorotate dehydrogenase), lateral transfer of genes from different groups of gram-positive bacteria to the archaebacteria (as indicated by thin dashed arrows) is postulated. Scenario II, on the other hand, suggests that the ancestral archaebacterial phenotype may have evolved from gram-positive bacteria (solid arrows) in response to antibiotic selection pressure and that the genes involved in antibiotic resistance (which may include many genes involved in the information transfer processes) were subsequently acquired laterally by other gram-positive bacteria to create additional monoderm prokaryotes with an archaebacterium-like genotype.
FIG. 24
FIG. 24
Evolutionary relationships within prokaryotes as deduced from signature sequences in various proteins. Although, due to ease of presentation, this figure depicts archaebacteria as distinct from other prokaryotes, the alternate view where archaebacteria are derived from gram-positive bacteria (Fig. 23) is favored based on the available evidence. Beginning with the universal ancestor (〉), the order of evolution of different prokaryotic groups as deduced from signature sequences in different proteins is as shown in this diagram. The asterisks on certain proteins indicate that the timing when these signature sequences were introduced may change with sequence information from additional bacterial phyla. The branching order of various eubacterial groups is consistent with the detailed phylogenies based on Hsp70 and GroEL sequences (Fig. 8 and 11).
FIG. 25
FIG. 25
The eocyte version of the archaebacterial tree based on signature sequence in the EF-1α/Tu protein sequences, as suggested by Rivera and Lake (198). This tree indicates that the ancestral eukaryotic cell has directly descended from within the archaebacterial lineage, with eocyte archaebacteria as its closest relatives.
FIG. 26
FIG. 26
Excerpt from the Hsp70 sequence alignment showing some of the important sequence signatures (boxed regions) distinguishing eukaryotic nuclear-cytosolic homologs from prokaryotic and organellar homologs. G and G+ refer to gram-negative bacteria and gram-positive bacteria, respectively. The boxed region marked ① shows the diderm insert in the N-terminal quadrant common to all eukaryotic homologs and gram-negative bacteria. The signatures marked ② and ③ identify two indels that distinguish eukaryotic nuclear-cytosolic homologs from all organellar and prokaryotic homologs. Other prokaryotic homologs not included in this alignment (e.g., some shown in Fig. 3) also contained the indicated signature sequences. Not all signature sequences of the above kinds are shown. The notation (e) in parentheses identifies ER Hsp70 homologs. Mitochond. and hydrogeno. refer to mitochondria and hydrogenosome homologs.
FIG. 27
FIG. 27
A consensus neighbor-joining tree based on Hsp70 sequences (bootstrapped 100 times) showing the relationship between prokaryotic and various eukaryotic homologs. The tree is based on 531 aligned amino acid positions. The main points to be noted are as follows: mitochondrial and chloroplasts homologs show a specific relationship to the α proteobacteria and cyanobacteria, respectively; the hydrogenosome homolog from Trichomonas branches with the mitochondrial clade; the eukaryotic nuclear-cytosolic homologs form a distinct clade within gram-negative bacteria unrelated to the organellar homologs; the ER and cytosolic homologs form paralogous gene families; and archaebacterial homologs (marked with asterisks) show polyphyletic branching within gram-positive bacteria. In the tree shown, only a small number of divergent eukaryotic homologs are included. However, inclusion of additional eukaryotic homologs does not alter the phylogenetic relationship shown here (unpublished results).
FIG. 28
FIG. 28
Signature sequences (boxed and shaded) in the Hsp70 protein showing the relationship of eukaryotic cytosolic homologs to proteobacteria-1 group (alpha, delta, and epsilon subdivisions as well as Thermomicrobium roseum). The homologs from various prokaryotic phyla as well as different eukaryotic homologs are identified. The notations (m), (c), (e), and (h) denote mitochondrial, chloroplast, ER, and hydrogenosome homologs, respectively. The signatures P1 and P2 identify sequences that distinguish between proteobacteria-1 and -2. The presence in all nuclear-cytosolic homologs of the 2-aa proteobacteria-1 signature but not the 4-aa proteobacteria-2 signature provides evidence that these homologs are derived from a member of the proteobacteria-1 group. The signatures marked E1 are also common to proteobacteria-1 and proteobacteria-2 as well as eukaryotic cytosolic Hsp70s, supporting the above inference. The signatures E2 identify two substitutions that are present in all members of the alpha proteobacteria as well as mitochondrial and hydrogenosome homologs but absent in other groups of proteobacteria and eukaryotic cytosolic homologs. These signature suggest that the eukaryotic nuclear-cytosolic homologs have originated independently of mitochondria and hydrogenosomes.
FIG. 29
FIG. 29
Signature sequences (boxed) in the Hsp90 (a), IMP dehydrogenase (b), adenylosuccinate synthetase (c) proteins showing the relatedness of the eukaryotic cytosolic homologs (E) to eubacteria (G+ and G) rather than archaebacteria (A). For Hsp90, no archaebacterial homolog has been identified in the three genomes that have been completely sequenced (26, 138, 215).
FIG. 30
FIG. 30
Neighbor-joining distance tree based on Hsp90 sequences indicating that the cytosolic and ER resident forms of these protein form paralogous gene families, which resulted from a gene duplication event very early in the history of eukaryotic cells. The bootstrap scores out of 1,000 replicates are shown. Reproduced from reference with permission of the publisher.
FIG. 31
FIG. 31
Signature sequences (boxed) in Hsp90 proteins showing the distinctness of eukaryotic homologs from prokaryotic homologs ① and the distinction between ER homologs and the cytosolic homologs ②.
FIG. 32
FIG. 32
Origin of the eukaryotic cell nucleus and endomembrane system as per the chimeric model. The key event in the origin of the eukaryotic cell is postulated to be a symbiotic association between a gram-negative eubacterium (from the proteobacteria-1 group) and likely an “eocyte” archaebacterium. This association led to the loss of the outer membrane from the gram-negative bacterium (not shown). As the membrane of the gram-negative bacterium surrounded the eocyte species, the membrane of the latter species, containing ether-linked lipids (wavy line), became redundant and was lost. Eventual separation of the membrane infolds led to the formation of the nuclear envelope and ER. The formation of these new compartments was preceded or accompanied by duplication of the genes for the chaperone proteins (Hsp70, Hsp90, DnaJ, etc.), which are necessary for protein transport and communication within the compartments. The transfer of the genome from the gram-negative eubacterium to the newly formed nucleus and an assortment and integration of genes from the two partners led to the formation of the ancestral eukaryotic cell. Modified and reproduced from reference with permission of the publisher.
FIG. 33
FIG. 33
Signature sequence in glucose-fructose-6-phosphate transaminase, showing the presence of a unique signature (boxed) in eukaryotic homologs. The eukaryotic homologs for Hsp70 (Fig. 26) and Hsp90 (Fig. 31) also contain several unique sequence signatures not found in any prokaryotic homologs. These signature provides evidence that all of the eukaryotes are derived from a single ancestor and that the postulated fusion event was unique.

References

    1. Adam R D. The biology of Giardia spp. Microbiol Rev. 1991;55:706–732. - PMC - PubMed
    1. Ahmad S, Ahuja R, Venner T J, Gupta R S. Identification of a protein altered in mutants resistant to microtubule inhibitors as a member of the major heat shock protein (hsp70) family. Mol Cell Biol. 1990;10:5160–5165. - PMC - PubMed
    1. Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson J D. Molecular biology of the cell. New York, N.Y: Garland Publishing, Inc.; 1994.
    1. Allsopp A. Phylogenetic relationships of the procaryota and the origin of the eucaryotic cell. New Phytol. 1969;68:591–612.
    1. Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed

Publication types

LinkOut - more resources