Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec;26(12):2775-94.
doi: 10.1093/molbev/msp201. Epub 2009 Sep 4.

A comprehensive classification and evolutionary analysis of plant homeobox genes

Affiliations

A comprehensive classification and evolutionary analysis of plant homeobox genes

Krishanu Mukherjee et al. Mol Biol Evol. 2009 Dec.

Abstract

The full complement of homeobox transcription factor sequences, including genes and pseudogenes, was determined from the analysis of 10 complete genomes from flowering plants, moss, Selaginella, unicellular green algae, and red algae. Our exhaustive genome-wide searches resulted in the discovery in each class of a greater number of homeobox genes than previously reported. All homeobox genes can be unambiguously classified by sequence evolutionary analysis into 14 distinct classes also characterized by conserved intron-exon structure and by unique codomain architectures. We identified many new genes belonging to previously defined classes (HD-ZIP I to IV, BEL, KNOX, PLINC, WOX). Other newly identified genes allowed us to characterize PHD, DDT, NDX, and LD genes as members of four new evolutionary classes and to define two additional classes, which we named SAWADEE and PINTOX. Our comprehensive analysis allowed us to identify several newly characterized conserved motifs, including novel zinc finger motifs in SAWADEE and DDT. Members of the BEL and KNOX classes were found in Chlorobionta (green plants) and in Rhodophyta. We found representatives of the DDT, WOX, and PINTOX classes only in green plants, including unicellular green algae, moss, and vascular plants. All 14 homeobox gene classes were represented in flowering plants, Selaginella, and moss, suggesting that they had already differentiated in the last common ancestor of moss and vascular plants.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.—
FIG. 1.—
Multiple sequence alignment of homeodomain sequence representatives from Arabidopsis thaliana (At), rice Oryza sativa (Os), moss Physcomitrella patens (Pp), and spikemoss Selaginella moellendorffii (Sm). Canonical homeodomain sequence numbering (excluding loops), the TALE class three-residue insertion (abc), and the position of the three homeodomain helices are indicated. The alignments presented in this and other figures were obtained using MUSCLE (Edgar 2004) and conserved amino acids of different physicochemical properties are highlighted in different shades of gray using the Clustal-Qt (Larkin et al. 2007) alignment-drawing software.
F<sc>IG</sc>. 2.—
FIG. 2.—
Circular representation of the evolutionary tree of plant homeodomain sequences (branch lengths not drawn to scale), based on all homeodomain sequences identified in Arabidopsis, Selaginella, moss, unicellular green algae, and red algae supplemented by selected sequences from other flowering plant species wherever only one protein from Arabidopsis was found (see supplementary table 1 [Supplementary Material online] for a complete list of species). The tree, which should be considered unrooted, was obtained with the ML procedure implemented in PHYML with the JTT substitution model using an alignment of homeodomain sequences as shown in supplementary fig. 1 (Supplementary Material online). Bootstrap support was based on 1,000 replicates and is indicated for relevant branches. Bootstrap support obtained after excluding all sequences from unicellular green algae and red algae is shown in parentheses. Clades supported by robust bootstrap values (70% or more) are shown with thicker lines. Bootstrap values for intraclass branches are omitted. Bootstrap values less than 5% for branches connecting different classes are not shown. All homeodomain classes are identified as separate clades in this analysis and are consistently supported by conserved, class-specific domain architecture (see fig. 3) and unique splice junctions (see fig. 4). Red-colored branches indicate sequences from red algae, blue-colored branches refer to the unicellular green alga Chlamydomans reinhardtii, the light-blue color identifies the unicellular green algae Ostreococcus lucimarinus and Ostreococcus tauri, gold-colored branches indicate moss proteins, and a light-green color is used to represent sequences from Selaginella.
F<sc>IG</sc>. 3.—
FIG. 3.—
Schematic overview of the domain architecture of all 14 classes of plant homeodomain proteins. The following domains and motifs are indicated: HD, leucine-zipper (LZ), CPSCE motif, CESV motif, START domain, homeodomain-START associated domain (HD-SAD), MEKHLA domain, PLINC zinc finger, BEL domain (A & B), KNOX domain (A & B), ELK motif, DDT domain, WSD motif, D-TOX ZF, PEX-PHD, PHD, LUMI, conserved motifs in LD homeodomain proteins (LD1, LD2, LD4, AND LD5). Among DDT proteins, only D-TOX A is indicated with its full symbol; D-TOX B, D-TOX C, D-TOX D, D-TOX E, D-TOX F, and D-TOX G are indicated as B, C, D, E, F or G, respectively.
F<sc>IG</sc>. 4.—
FIG. 4.—
Intron positions in the homeobox sequences of Arabidopsis thaliana. A consensus (majority) 60-residue homeodomain sequence is shown above a ruler where individual codon base positions are separated by tick marks. Intron positions are indicated by vertical lines labeled by the class names where each intron was found. The intron positions are generally conserved within each class. Exceptions are ATHB1, which has one intron, whereas other HD-ZIP I genes are single-exon genes; and At4g12750, a DDT class member, which has an extra intron at the beginning of the homeodomain. Gene classes not shown (HD-ZIP I, WOX, PLINC) do not have introns in the homeodomain. One of the splice sites in the NDX genes lies in the loop between helix 2 and helix 3 where it cannot be placed accurately on the consensus sequence. Its approximate position is marked by a crossbar at the end of the line.
F<sc>IG</sc>. 5.—
FIG. 5.—
ML tree obtained using the homeodomain and codomain sequence of KNOX class proteins. KNOX class can be subdivided into two families: KNOX I and KNOX II. Each of these can be further subdivided into two subfamilies having members conserved in both monocots (light-green boxes) and eudicots (light-blue boxes). Selaginella (Sm) proteins are shown in yellowish green–colored boxes, whereas moss (Pp) proteins are shown inside the gold-colored boxes. Each subfamily harbors distinct signature motifs, schematically represented next to the subfamily name. The tree has been rooted with BEL class protein representatives.
F<sc>IG</sc>. 6.—
FIG. 6.—
Multiple sequence alignment of the ELK motif from KNOX proteins and of the C-terminal part of the PBC-B domain from PBC proteins showing the sequence similarity and likely homology of the two regions. Helices predicted from the alignments of PBC or KNOX sequences are shown above the alignment. At each position conserved amino acid types with similar physicochemical properties are highlighted in different shades of gray.
F<sc>IG</sc>. 7.—
FIG. 7.—
Multiple sequence alignment of (a) ZIBEL motif sequences identified at the N-terminus and C-terminus of BEL class homeodomain proteins and at the N-terminus of HD-ZIP II class proteins and (b) the BEL-A (SKY) region of plant BEL class proteins and the MEIS-A domain of animal MEIS class proteins. Secondary structure predictions of BEL-A and MEIS proteins are shown above and below the sequence alignment, respectively. At each position conserved amino acid types with similar physicochemical properties are highlighted in different shades of gray.
F<sc>IG</sc>. 8.—
FIG. 8.—
Multiple sequence alignment of (a) the zinc finger motif (D-TOX ZF) uniquely found in the plant DDT class. Conserved cysteine residues are highlighted as “C.” (b) the WSD motif of the DDT class. At each position conserved amino acid types with similar physicochemical properties are highlighted in different shades of gray. Alignment of the WSD motif of the DDT class found in plant and animal sequences showing the regions of highest similarity (boxed residues).
F<sc>IG</sc>. 9.—
FIG. 9.—
ML tree obtained using the homeodomain and associated signature codomains of DDT class proteins, outgrouped by PINTOX class proteins (At_At5g11270 and St_Pint1). Proteins from monocots are shown within light-green–colored boxes; from eudicots in light-blue–colored boxes; from moss in musky-green–colored boxes, and from Selaginella yellowish green–colored boxes. The DDT class can be subdivided into three families: D-TOX1, D-TOX2, and D-TOX3. D-TOX3 is eudicot specific and has secondarily lost all signature codomains found in D-TOX1 and D-TOX2 with the exception of the D-TOX A codomain. A DDT protein from Chlamydomonas reinhardtii (Cr_C460085), three proteins from moss (Pp_sca_419, Pp_sca_15b and Pp_sca_89), and two proteins from Selaginella (Sm_422900 and Sm_447019) cannot be classified with certainty within any of the three families.
F<sc>IG</sc>. 10.—
FIG. 10.—
Multiple alignments of (a) the LUMI domain of the LD class; (b) the PINTOX domain of PINTOX class proteins; and (c) the SAWADEE domain of SAWADEE class proteins. The position of conserved cysteine and histidine residues is highlighted below the alignment. At each position conserved amino acid types with similar physicochemical properties are highlighted in different shades of gray.
F<sc>IG</sc>. 11.—
FIG. 11.—
Schematic representation of the proposed evolutionary history of plant homeobox gene classes and codomains. A class or motif represented in a parental branch indicates that the same class/motif is also present in all of its child-branches unless otherwise indicated. The HD-ZIP I to IV classes are represented by the single domain architecture HD-LZ-CPSC-START-HD-SAD-MEKHLA. In this representation, arrows separate motif groups whose addition defines, in the order, HD-ZIP classes I, II, IV, and III. Loss of the START domain in the genomes of unicellular green algae and red algae is represented by the domain crossed in red. Acquisition of the MEKHLA domain through the cyanobacteria/chloroplast is indicated by an arrow.
F<sc>IG</sc>. 12.—
FIG. 12.—
Summary representation of the trees shown in fig. 2 and supplementary figs. 2 and 3 (Supplementary Material online), where sequences from each class are represented by one branch and the low–bootstrap support (<50%) branches connecting different TALE or non-TALE classes are hidden by circles representing uncertainty of their relations.

References

    1. Adams KL, Wendel JF. Polyploidy and genome evolution in plants. Curr Opin Plant Biol. 2005;8:135–141. - PubMed
    1. Agalou A, Purwantomo S, Overnas E, et al. (14 co-authors) A genome-wide survey of HD-Zip genes in rice and analysis of drought-responsive family members. Plant Mol Biol. 2008;66:87–103. - PubMed
    1. Aso K, Kato M, Banks JA, Hasebe M. Characterization of homeodomain-leucine zipper genes in the fern Ceratopteris richardii and the evolution of the homeodomain-leucine zipper gene family in vascular plants. Mol Biol Evol. 1999;16:544–552. - PubMed
    1. Baima S, Possenti M, Matteucci A, Wisman E, Altamura MM, Ruberti I, Morelli G. The arabidopsis ATHB-8 HD-zip protein acts as a differentiation-promoting transcription factor of the vascular meristems. Plant Physiol. 2001;126:643–655. - PMC - PubMed
    1. Becker A, Bey M, Bürglin TR, Saedler H, Theissen G. Ancestry and diversity of BEL1-like homeobox genes revealed by gymnosperm (Gnetum gnemon) homologs. Dev Genes Evol. 2002;212:452–457. - PubMed

Publication types

Substances