Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 1997 Dec 9;94(25):13749-53.
doi: 10.1073/pnas.94.25.13749.

Did homeodomain proteins duplicate before the origin of angiosperms, fungi, and metazoa?

Affiliations
Comparative Study

Did homeodomain proteins duplicate before the origin of angiosperms, fungi, and metazoa?

G Bharathan et al. Proc Natl Acad Sci U S A. .

Abstract

Homeodomain proteins are transcription factors that play a critical role in early development in eukaryotes. These proteins previously have been classified into numerous subgroups whose phylogenetic relationships are unclear. Our phylogenetic analysis of representative eukaryotic sequences suggests that there are two major groups of homeodomain proteins, each containing sequences from angiosperms, metazoa, and fungi. This result, based on parsimony and neighbor-joining analyses of primary amino acid sequences, was supported by two additional features of the proteins. The two protein groups are distinguished by an insertion/deletion in the homeodomain, between helices I and II. In addition, an amphipathic alpha-helical secondary structure in the region N terminal of the homeodomain is shared by angiosperm and metazoan sequences in one group. These results support the hypothesis that there was at least one duplication of homeobox genes before the origin of angiosperms, fungi, and metazoa. This duplication, in turn, suggests that these proteins had diverse functions early in the evolution of eukaryotes. The shared secondary structure in angiosperm and metazoan sequences points to an ancient conserved functional domain.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Phylogenetic relationships between eukaryotic homeodomain protein sequences indicate an ancient duplication that occurred before the origin of angiosperms, metazoa, and fungi. Homeodomain proteins are divided into two groups, a and b, each containing well supported subgroups from all three kingdoms: angiospermae (green), fungi (red), and metazoan (blue). This tree is a consensus of results from different phylogenetic analyses of a dataset of 60 sequences from which a 3-aa insertion/deletion site was removed. The strict consensus of 59 trees was obtained after removing 14 sequences including subgroups ZM-HOX and SIX2. These 14 sequences occupy variable positions on the tree in all analyses. Results are presented as unrooted trees, because no outgroup sequence is known. Similar results were obtained from neighbor-joining analyses of larger datasets. All sequences in group a have a 3-aa insertion (arrow) in the homeodomain. Several sequences in group a share an amphipathic helical secondary structure in the region N terminal to the homeodomain (•). (B) The distributions of two protein characteristics are consistent with the phylogenetic tree based on primary sequence data. This tree was obtained from neighbor-joining analyses of pairwise p-distances. Strongly supported angiosperm protein subgroups (green) are associated with fungal (red) and metazoan (blue) subgroups. Sequence names are indicated as follows: the first two letters represent the Latin name and are followed by the name of the gene. Angiospermae: AT, Arabidopsis thaliana; DC, Daucus carota; LE, Lycopersicon esculentum; LP, Lycopersicon peruvianum; OS, Oryza sativa; PS, Phalaenopsis sp.; PC, Petroselinium crispum; ZM, Zea mays. Metazoa: CE, Caenorhabditis elegans; DM, Drosophila melanogaster; EG, Echinococcus granulosus; HS, Homo sapiens; LS, Lineus sanguineus; MM, Mus musculus; XL, Xenopus laevis. Fungi: SC, Saccharomyces cerevisiae; SCH, Schizophyllum commune; UM, Ustilago maydis. Branches are drawn proportional to p-distance. The scale represents p-distance. Numbers along each branch indicate bootstrap values over 50%. Most internal branches have low statistical support. Branch 1 derives support from evidence external to primary sequence data. Presence of three amino acids in the insertion/deletion (thick branches) marks most of the sequences in group a. The SIX2 subgroup is assumed to have lost three amino acids on this tree, but not in other trees where its phylogenetic position is outside of group a. The phylogenetic distribution of the amphipathic helix in the N terminal region (•), its absence (○), and a short N terminal region (□) indicates that the N terminal structure characterizes sequences in group a.
Figure 2
Figure 2
Secondary structure in the region immediately N terminal to the homeodomain is conserved across some angiosperm and metazoan proteins. This alignment of N terminal regions for some group a proteins shows N terminal helical regions (shaded amino acids), nonhelical linker region, and the homeodomain. Helical regions were predicted for the N terminal regions by using Phomeodomain Sec (http://www.embl-heidelberg.de/predictprotein/predictprotein.html/). We identified two alpha helical regions in the angiosperm KN and metazoan EXD subgroups. The first is immediately adjacent to the homeodomain and contains two short helices (ELK domain) and was not detected in any other sequences. The second region lies further N terminal and consists of a long amphipathic helix. This helix, if found in other protein subgroups, was either short, or not amphipathic and not alignable. By using a helical wheel representation it was possible to align the sequence such that conserved amino acids (boxed and numbered) were positioned on one (hydrophobic) face of the helix. Gaps correspond to one or two turns of the helix and thus maintain the conserved face of the helix.
Figure 3
Figure 3
Reconciled tree (A) showing seven gene duplication events (circles) postulated to have occurred in the common ancestor of angiosperms, metazoa, and fungi before their diversification. The reconciled tree (A) is obtained from the gene tree (B) and organism tree (C) by adding leaves (hatched) such that the gene tree and the organismal association of the sequences can be explained by shared common history alone. The reconciled tree suggests that nine sequences are missing (?), because of either lack of sampling or gene loss, or because they arose in the common ancestor of fungi and animals and are therefore absent from angiosperms. Reconciled trees based on alternative gene trees gave estimates of 7–11 duplications. These numbers are merely illustrative, and a precise estimate can be made only with the acquisition of a wider sample of sequences.

References

    1. McGinnis W, Levine M S, Hafen E, Kuroiwa A, Gehring W J. Nature (London) 1984;308:428–433. - PubMed
    1. Scott M P, Weiner A J. Proc Natl Acad Sci USA. 1984;81:4115–4119. - PMC - PubMed
    1. Scott M P, Tamkun J W, Hartzell G W. Biochim Biophys Acta. 1989;989:25–48. - PubMed
    1. Kappen C, Schughart K, Ruddle F H. Genomics. 1993;18:54–70. - PubMed
    1. Bürglin T R. In: Guidebook to the Homeobox Genes. Duboule D, editor. New York: Oxford; 1994. pp. 27–71.

Publication types

Substances

LinkOut - more resources