. 2001 Nov 1;29(21):4319-33.

doi: 10.1093/nar/29.21.4319.

The Arabidopsis thaliana genome contains at least 29 active genes encoding SET domain proteins that can be assigned to four evolutionarily conserved classes

L O Baumbusch¹, T Thorstensen, V Krauss, A Fischer, K Naumann, R Assalkhou, I Schulz, G Reuter, R B Aalen

Affiliations

PMID: 11691919
PMCID: PMC60187
DOI: 10.1093/nar/29.21.4319

The Arabidopsis thaliana genome contains at least 29 active genes encoding SET domain proteins that can be assigned to four evolutionarily conserved classes

L O Baumbusch et al. Nucleic Acids Res. 2001.

. 2001 Nov 1;29(21):4319-33.

doi: 10.1093/nar/29.21.4319.

Authors

L O Baumbusch¹, T Thorstensen, V Krauss, A Fischer, K Naumann, R Assalkhou, I Schulz, G Reuter, R B Aalen

Affiliation

¹ Division of Molecular Biology, Department of Biology, University of Oslo, PO Box 1031 Blindern, N-0315 Norway.

PMID: 11691919
PMCID: PMC60187
DOI: 10.1093/nar/29.21.4319

Abstract

SET domains are conserved amino acid motifs present in chromosomal proteins that function in epigenetic control of gene expression. These proteins can be divided into four classes as typified by their Drosophila members E(Z), TRX, ASH1 and SU(VAR)3-9. Homologs of all four classes have been identified in yeast and mammals, but not in plants. A BLASTP screening of the Arabidopsis genome identified 37 genes: three E(z) homologs, five trx homologs, four ash1 homologs and 15 genes similar to Su(var)3-9. Seven genes were assigned as trx-related and three as ash1-related. Only four genes have been described previously. Our classification is based on the characteristics of the SET domains, cysteine-rich regions and additional conserved domains, including a novel YGD domain. RT-PCR analysis, cDNA cloning and matching ESTs show that at least 29 of the genes are active in diverse tissues. The high number of SET domain genes, possibly involved in epigenetic control of gene activity during plant development, can partly be explained by extensive genome duplication in Arabidopsis. Additionally, the lack of introns in the coding region of eight SU(VAR)3-9 class genes indicates evolution of new genes by retrotransposition. The identification of putative nuclear localization signals and AT-hooks in many of the proteins supports an anticipated nuclear localization, which was demonstrated for selected proteins.

PubMed Disclaimer

Figures

**Figure 1**
Structure of *Arabidopsis* SET domain proteins. Protein sequences obtained from annotations in the EMBL and MIPS databases, adjusted by ESTs, sequences of RT–PCR products and cDNAs, were analyzed for conserved domains (see Materials and Methods). Lengths of proteins and position of domains are shown to scale except when indicated by \\. SET, SET domain; EDII, E(Z) domain II; Cys (E(Z)), cysteine-rich region found in E(Z) class proteins; Cys (ASH), cysteine-rich region found in ASH1 class proteins; PHD, PHD finger; ePHD, extended PHD finger; N-SAC, N-terminal part of SET-associated cysteine-rich (SAC) region; PWWP, PWWP domain; YDG, YDG domain; NLS, bipartite nuclear localization signal; ZiFi, zinc finger; AT, AT-hook; one, two or three horizontal lines indicate the number of cysteines in the C-terminal SAC. The Cys-rich domain of ASHH3 is not significant according to the domain searches, but aligns well with the Cys-rich domains of the other ASHH proteins.

**Figure 2**
Relationship between SET domain proteins of *Arabidopsis* and other organisms. The tree was constructed using the ClustalX program based on alignments of SET domains by ClustalX and manual adjustment. Figures indicate bootstrap values (1000 = 100%). Values >60% are shown. E (Z), *Drosophila* E(Z), P42124; EZH2, human E(Z) homolog 2, Q15910; MES-2, *C.elegans* maternal effect sterile 2 E(Z) homolog, AAC27125.1; TRX, *Drosophila* TRX, P20659; HRX, human TRX homolog, Q03164; SET1, *S.cerevisiae* TRX homolog, NP_011987.1; ASH1, *Drosophila* ASH1, AAF49140.2; SET2, *S.cerevisiae* ASH1 homolog, YJL168c; SU VAR 3-9, *Drosophila* SU(VAR)3-9, P45975; SUV39H, human SU(VAR)3-9 homolog, AAF06805.1; G9a, human SET domain protein, NP_006700; CLR4, *S.pombe* SU(VAR)3-9 homolog, T43700.

**Figure 3**
Alignment of SET domains and flanking cysteine-rich regions of the four classes of SET domain proteins. The SET domains of all proteins are perfectly aligned from the GWG motif (positions 149–151), while the cysteine-rich domains N-terminal to the SET domains are aligned within each group to show class characteristics in this region. The TRX class lacks such a region. Note also the C-SAC motif from position 300, which is lacking in the E(Z) class. The degree of conservation is distinguished at four levels (100, 80 and 60% and not conserved), where 100% has the darkest shade of gray. The upper and lower case letters in the consensus line indicate 100 and 80% conservation within all groups, respectively. Numbers in the consensus line represent conserved similarity groups as defined by the Blosum 62 scoring table. Yellow, ASH1 class proteins; green, TRX class proteins; blue, E(Z) class proteins; red, SU(VAR)3-9 class proteins; orange, residues that when mutated abolish self-association and the SNR1 interaction of the TRX SET domain and loss of HMTase acitivity of SUV39H (20,47); dark blue, H residue which when changed to R increases HMTase activity of SUV39H (20). (..) indicates that short stretches of non-conserved amino acids were omitted from sequences in the SU(VAR)3-9 class, in regions marked ** below the alignment, so as to fit the figure on one page.

**Figure 4**
(Opposite) Alignment of domains found in SET domain proteins. (A) YDG domain. Note that the first six amino acids (GLVPGV) of SUVH10 are from another reading frame, followed by 11 amino acids (DVGDIFFFRGE) from the same frame as the annotated ORF (T6P5.10). HsICBP90, *Homo* *sapiens* *in vitro* CCAAT-binding protein 90, AAF28469.1; MmNp95, *Mus* *musculus* nuclear binding protein 95, AAK55743.1; DrCHP, *Deinococcus* *radiodurans* conserved hypothetical protein, AAC28190. (B) PHD fingers. (C) PWWP domain. WHSC1, human WHSC1 protein, DD19343. (D) ePHD fingers. (E) AT-hooks. For shadings and consensus line see Figure 3.

**Figure 5**
RT–PCR expression analyses. (A) Agarose gels stained with ethidium bromide showing cDNA fragments of *SUVH1*, *SUVH2*, *SUVH3*, *SUVH4*, *SUVH5* and *AtCyclophilin* (positive control; 63) amplified by RT–PCR using gene-specific primers. RT–PCR reactions were performed on DNase I-treated total RNA isolated from seeds (E), roots (R), leaves (L), stems (S), floral buds (F), inflorescences (I) and green siliques (P). A negative (H₂O) and a positive (genomic DNA, G) control reaction are shown to the right of the RT–PCR reactions. The PCR fragment sizes are given on both sides in bp. Note the intronless fragments of *SUVH1* and *SUVH5*. The PCR primers for *SUVH2* and *SUVH3* were designed to amplify their 3′- and 5′-UTR, respectively, where introns are found in the genomic sequences (see text). (B) Agarose gels showing RT–PCR fragments (R) of selected *ATX*, *ATXR*, *ASHH* and *SUVR* mRNAs, amplified by gene-specific primers. RT–PCR reactions were performed on mRNA isolated from floral buds using magnetic oligo(dT) beads. Note that each genomic fragment (G) is longer than the corresponding RT–PCR fragment obtained with the same primers due to the presence of introns. Size markers are ΦX174 DNA digested with *Hae*III and λ DNA digested with *Hin*dIII.

**Figure 6**
Nuclear localization of SET domain proteins in onion epidermis transient expression assay with the plant GFP reporter system. Histochemical localization of GFP activity following bombardment of onion epidermal cell layers with DNA constructs expressing either GFP alone (A), a fusion of CLF to GFP (B), a fusion of SUVH1 to GFP (C), a fusion of SUVH2 to GFP (D), a fusion of SUVH3 to GFP (E) and a fusion of *Drosophila* SU(VAR)3-9 to GFP (F) is shown. GFP fluorescence was revealed 2 h after bombardment utilizing Nomarski optics.

See this image and copyright information in PMC

References

1. Henikoff S. (1996) Position-effect variegation in Drosophila. In Russo,V.E.A. (ed.), Epigenetic Mechanisms of Gene Regulation. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp. 319–334.
1. Wallrath L. (1998) Unfolding the mysteries of heterochromatin. Curr. Opin. Genet. Dev., 8, 147–153. - PubMed
1. Weiler K.S. and Wakimoto,B.T. (1995) Heterochromatin and gene expression in Drosophila. Annu. Rev. Genet., 29, 577–605. - PubMed
1. Aagaard L., Laible,G., Selenko,P., Schmid,M., Dorn,R., Schotta,G., Kuhfittig,S., Wolf,A., Lebersorger,A., Singh,P.B. et al. (1999) Functional mammalian homologs of the Drosophila PEV-modifier Su(var)3-9 encode centromere-associated proteins which complex with the heterochromatin component M31. EMBO J., 18, 1923–1938. - PMC - PubMed
1. van Lohuizen M. (1998) Functional analysis of mouse polycomb group genes. Cell. Mol. Life Sci., 54, 71–79. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Associated data

Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Arabidopsis thaliana genome contains at least 29 active genes encoding SET domain proteins that can be assigned to four evolutionarily conserved classes

Affiliation

The Arabidopsis thaliana genome contains at least 29 active genes encoding SET domain proteins that can be assigned to four evolutionarily conserved classes

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Associated data

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases