Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Nov 1;29(21):4319-33.
doi: 10.1093/nar/29.21.4319.

The Arabidopsis thaliana genome contains at least 29 active genes encoding SET domain proteins that can be assigned to four evolutionarily conserved classes

Affiliations

The Arabidopsis thaliana genome contains at least 29 active genes encoding SET domain proteins that can be assigned to four evolutionarily conserved classes

L O Baumbusch et al. Nucleic Acids Res. .

Abstract

SET domains are conserved amino acid motifs present in chromosomal proteins that function in epigenetic control of gene expression. These proteins can be divided into four classes as typified by their Drosophila members E(Z), TRX, ASH1 and SU(VAR)3-9. Homologs of all four classes have been identified in yeast and mammals, but not in plants. A BLASTP screening of the Arabidopsis genome identified 37 genes: three E(z) homologs, five trx homologs, four ash1 homologs and 15 genes similar to Su(var)3-9. Seven genes were assigned as trx-related and three as ash1-related. Only four genes have been described previously. Our classification is based on the characteristics of the SET domains, cysteine-rich regions and additional conserved domains, including a novel YGD domain. RT-PCR analysis, cDNA cloning and matching ESTs show that at least 29 of the genes are active in diverse tissues. The high number of SET domain genes, possibly involved in epigenetic control of gene activity during plant development, can partly be explained by extensive genome duplication in Arabidopsis. Additionally, the lack of introns in the coding region of eight SU(VAR)3-9 class genes indicates evolution of new genes by retrotransposition. The identification of putative nuclear localization signals and AT-hooks in many of the proteins supports an anticipated nuclear localization, which was demonstrated for selected proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Structure of Arabidopsis SET domain proteins. Protein sequences obtained from annotations in the EMBL and MIPS databases, adjusted by ESTs, sequences of RT–PCR products and cDNAs, were analyzed for conserved domains (see Materials and Methods). Lengths of proteins and position of domains are shown to scale except when indicated by \\. SET, SET domain; EDII, E(Z) domain II; Cys (E(Z)), cysteine-rich region found in E(Z) class proteins; Cys (ASH), cysteine-rich region found in ASH1 class proteins; PHD, PHD finger; ePHD, extended PHD finger; N-SAC, N-terminal part of SET-associated cysteine-rich (SAC) region; PWWP, PWWP domain; YDG, YDG domain; NLS, bipartite nuclear localization signal; ZiFi, zinc finger; AT, AT-hook; one, two or three horizontal lines indicate the number of cysteines in the C-terminal SAC. The Cys-rich domain of ASHH3 is not significant according to the domain searches, but aligns well with the Cys-rich domains of the other ASHH proteins.
Figure 2
Figure 2
Relationship between SET domain proteins of Arabidopsis and other organisms. The tree was constructed using the ClustalX program based on alignments of SET domains by ClustalX and manual adjustment. Figures indicate bootstrap values (1000 = 100%). Values >60% are shown. E (Z), Drosophila E(Z), P42124; EZH2, human E(Z) homolog 2, Q15910; MES-2, C.elegans maternal effect sterile 2 E(Z) homolog, AAC27125.1; TRX, Drosophila TRX, P20659; HRX, human TRX homolog, Q03164; SET1, S.cerevisiae TRX homolog, NP_011987.1; ASH1, Drosophila ASH1, AAF49140.2; SET2, S.cerevisiae ASH1 homolog, YJL168c; SU VAR 3-9, Drosophila SU(VAR)3-9, P45975; SUV39H, human SU(VAR)3-9 homolog, AAF06805.1; G9a, human SET domain protein, NP_006700; CLR4, S.pombe SU(VAR)3-9 homolog, T43700.
Figure 3
Figure 3
Alignment of SET domains and flanking cysteine-rich regions of the four classes of SET domain proteins. The SET domains of all proteins are perfectly aligned from the GWG motif (positions 149–151), while the cysteine-rich domains N-terminal to the SET domains are aligned within each group to show class characteristics in this region. The TRX class lacks such a region. Note also the C-SAC motif from position 300, which is lacking in the E(Z) class. The degree of conservation is distinguished at four levels (100, 80 and 60% and not conserved), where 100% has the darkest shade of gray. The upper and lower case letters in the consensus line indicate 100 and 80% conservation within all groups, respectively. Numbers in the consensus line represent conserved similarity groups as defined by the Blosum 62 scoring table. Yellow, ASH1 class proteins; green, TRX class proteins; blue, E(Z) class proteins; red, SU(VAR)3-9 class proteins; orange, residues that when mutated abolish self-association and the SNR1 interaction of the TRX SET domain and loss of HMTase acitivity of SUV39H (20,47); dark blue, H residue which when changed to R increases HMTase activity of SUV39H (20). (..) indicates that short stretches of non-conserved amino acids were omitted from sequences in the SU(VAR)3-9 class, in regions marked ** below the alignment, so as to fit the figure on one page.
Figure 4
Figure 4
(Opposite) Alignment of domains found in SET domain proteins. (A) YDG domain. Note that the first six amino acids (GLVPGV) of SUVH10 are from another reading frame, followed by 11 amino acids (DVGDIFFFRGE) from the same frame as the annotated ORF (T6P5.10). HsICBP90, Homo sapiens in vitro CCAAT-binding protein 90, AAF28469.1; MmNp95, Mus musculus nuclear binding protein 95, AAK55743.1; DrCHP, Deinococcus radiodurans conserved hypothetical protein, AAC28190. (B) PHD fingers. (C) PWWP domain. WHSC1, human WHSC1 protein, DD19343. (D) ePHD fingers. (E) AT-hooks. For shadings and consensus line see Figure 3.
Figure 5
Figure 5
RT–PCR expression analyses. (A) Agarose gels stained with ethidium bromide showing cDNA fragments of SUVH1, SUVH2, SUVH3, SUVH4, SUVH5 and AtCyclophilin (positive control; 63) amplified by RT–PCR using gene-specific primers. RT–PCR reactions were performed on DNase I-treated total RNA isolated from seeds (E), roots (R), leaves (L), stems (S), floral buds (F), inflorescences (I) and green siliques (P). A negative (H2O) and a positive (genomic DNA, G) control reaction are shown to the right of the RT–PCR reactions. The PCR fragment sizes are given on both sides in bp. Note the intronless fragments of SUVH1 and SUVH5. The PCR primers for SUVH2 and SUVH3 were designed to amplify their 3′- and 5′-UTR, respectively, where introns are found in the genomic sequences (see text). (B) Agarose gels showing RT–PCR fragments (R) of selected ATX, ATXR, ASHH and SUVR mRNAs, amplified by gene-specific primers. RT–PCR reactions were performed on mRNA isolated from floral buds using magnetic oligo(dT) beads. Note that each genomic fragment (G) is longer than the corresponding RT–PCR fragment obtained with the same primers due to the presence of introns. Size markers are ΦX174 DNA digested with HaeIII and λ DNA digested with HindIII.
Figure 5
Figure 5
RT–PCR expression analyses. (A) Agarose gels stained with ethidium bromide showing cDNA fragments of SUVH1, SUVH2, SUVH3, SUVH4, SUVH5 and AtCyclophilin (positive control; 63) amplified by RT–PCR using gene-specific primers. RT–PCR reactions were performed on DNase I-treated total RNA isolated from seeds (E), roots (R), leaves (L), stems (S), floral buds (F), inflorescences (I) and green siliques (P). A negative (H2O) and a positive (genomic DNA, G) control reaction are shown to the right of the RT–PCR reactions. The PCR fragment sizes are given on both sides in bp. Note the intronless fragments of SUVH1 and SUVH5. The PCR primers for SUVH2 and SUVH3 were designed to amplify their 3′- and 5′-UTR, respectively, where introns are found in the genomic sequences (see text). (B) Agarose gels showing RT–PCR fragments (R) of selected ATX, ATXR, ASHH and SUVR mRNAs, amplified by gene-specific primers. RT–PCR reactions were performed on mRNA isolated from floral buds using magnetic oligo(dT) beads. Note that each genomic fragment (G) is longer than the corresponding RT–PCR fragment obtained with the same primers due to the presence of introns. Size markers are ΦX174 DNA digested with HaeIII and λ DNA digested with HindIII.
Figure 6
Figure 6
Nuclear localization of SET domain proteins in onion epidermis transient expression assay with the plant GFP reporter system. Histochemical localization of GFP activity following bombardment of onion epidermal cell layers with DNA constructs expressing either GFP alone (A), a fusion of CLF to GFP (B), a fusion of SUVH1 to GFP (C), a fusion of SUVH2 to GFP (D), a fusion of SUVH3 to GFP (E) and a fusion of Drosophila SU(VAR)3-9 to GFP (F) is shown. GFP fluorescence was revealed 2 h after bombardment utilizing Nomarski optics.

References

    1. Henikoff S. (1996) Position-effect variegation in Drosophila. In Russo,V.E.A. (ed.), Epigenetic Mechanisms of Gene Regulation. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp. 319–334.
    1. Wallrath L. (1998) Unfolding the mysteries of heterochromatin. Curr. Opin. Genet. Dev., 8, 147–153. - PubMed
    1. Weiler K.S. and Wakimoto,B.T. (1995) Heterochromatin and gene expression in Drosophila. Annu. Rev. Genet., 29, 577–605. - PubMed
    1. Aagaard L., Laible,G., Selenko,P., Schmid,M., Dorn,R., Schotta,G., Kuhfittig,S., Wolf,A., Lebersorger,A., Singh,P.B. et al. (1999) Functional mammalian homologs of the Drosophila PEV-modifier Su(var)3-9 encode centromere-associated proteins which complex with the heterochromatin component M31. EMBO J., 18, 1923–1938. - PMC - PubMed
    1. van Lohuizen M. (1998) Functional analysis of mouse polycomb group genes. Cell. Mol. Life Sci., 54, 71–79. - PMC - PubMed

Publication types

MeSH terms