Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 7:12:207.
doi: 10.1186/1471-2229-12-207.

ST proteins, a new family of plant tandem repeat proteins with a DUF2775 domain mainly found in Fabaceae and Asteraceae

Affiliations

ST proteins, a new family of plant tandem repeat proteins with a DUF2775 domain mainly found in Fabaceae and Asteraceae

Lucía Albornos et al. BMC Plant Biol. .

Abstract

Background: Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats.

Results: ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development.

Conclusions: We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Phylogenetic tree of species having ST sequences. The phylogenetic tree was done with the Common Tree Taxonomy tool at NCBI and edited using FigTree v1.3.1. All species with ST proteins belong to Magnoliophyta (highlighted in orange) mainly in Asteridae (39%, highlighted in purple) and Rosidae (59%, highlighted in green). Two taxonomic families, the Fabaceae (24%, highlighted in light green) and the Asteraceae (19%, highlighted in light purple) grouped 43% of the genera found. The green algae, moss, lycophyte (no highlighted) as well as monocots (highlighted in yellow) do not have ST sequences. Numbers in brackets indicate the number of ST sequences found in each species.
Figure 2
Figure 2
Organization of the canonical ST gene. The gene has one intron starting 39 to 51 nucleotides from the initial methionine codon, ranging from 96 to 3486 nucleotides in length.
Figure 3
Figure 3
Organization of the ST proteins. (A) The signal peptide, the mature N-terminal end before repeats and the tandem repeat region are shown. (B) Detail of amino acid sequence of the N-terminal end of the mature protein. (C, D) Details of the tandem repeat oligopeptides represented using WebLogo. (C) Comparison of ST protein repeats having different sizes, showing the conserved consensus sequence DFEPRPX4Y. (D) Comparison of the most abundant ST protein repeats having 25 and 26 amino acids. The general consensus sequence for each repeat is DFEPRPX4YX6-7KXKKXFXK, which shows a less conserved K at positions 19, 21, 22 and 26 (26 amino acids repeats) or 18, 20, 21 and 25 (25 amino acids repeats). Apart from X4 positions, amino acids X6-7 in the consensus pattern are phylogenetically conserved, being rich in D and N. The WebLogo consists of stacks of symbols: one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of the symbols within the stack indicates the relative frequency of each amino acid at that position. Amino acids are coloured according to their chemical properties: polar amino acids (G,S,T,Y,C,Q,N) are in green; basic (K,R,H) in blue; acidic (D,E) in red, and hydrophobic (A,V,L,I,W,F,M) amino acids are in black. Bit: measure of conservation at a particular sequence position; the maximum conservation for a given amino acid in a sequence is 4.32 bits.
Figure 4
Figure 4
Identification of three main types of ST proteins according the X4pattern. (A) Percentages of the different types of ST proteins and the most typical X4 pattern found in each type. (B, C) Comparisons between type I and type IIa ST proteins represented using Two Sample Logo. (B) Comparison of 25 amino acids repeats of type I versus type IIa. The main difference is found in the X4 sequence. (C) Comparison of 26 amino acids repeats of type I versus type IIa. The difference in the conserved X4 sequence and the preference for D in type IIa in the first amino acid of the hexamer could be noted. The repeats of 26 amino acids showed greater variations, probably due to the higher sequence number analysed. The Two Sample Logo consists of stacks of symbols: one stack for each position in the sequence, the upper part represents type I proteins preference for one amino acid in a given position with respect to type IIa amino acid at the same position and vice versa in the lower part. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino acid at that position. Amino acids are coloured according to their chemical properties: polar amino acids (G,S,T,Y,C,Q,N) are in green; basic (K,R,H) in blue; acidic (D,E) in red, and hydrophobic (A,V,L,I,W,F,M) amino acids are in black.
Figure 5
Figure 5
Phylogenetic tree of ST sequences. A phylogenetic tree using the MegAlign program from DNASTAR® Lasergene 10.0 software, applying the CLUSTAL W program and a bootstrap (n= 2000) analysis, was carried out using 72 mature full-length ST sequences. Sequences clade according to taxonomic subclass and even to families. ST proteins from the Fabaceae family split into two groups, and a clear separation in the established types of ST proteins can be seen.
Figure 6
Figure 6
Comparison between cell wall structural proteins and arabinogalactan proteins and ST proteins. Phylogenetic trees were constructed as indicated in Figure 5. A. Phylogenetic tree constructed with different PRP types and ST protein sequences. B. Phylogenetic tree constructed with different HRGP and ST protein sequences. C. Phylogenetic tree constructed with different AGPs types and ST sequences. In all cases, STs appear as a clearly separate cluster. PRPs accession numbers (alphabetically): Arabidopsis thaliana AthPRP1 [UniProtKB:Q9M7P1], AthPRP2 [UniProtKB:Q9M7P0] AthPRP3 [UniProtKB:Q9M7N9] and AthPRP4 [UniProtKB:Q9M7N8]; Brassica napus BnaPRP1 [UniProtKB:Q39353]; Daucus carota DcaPRP1 [UniProtKB:Q39686]; Glycine max GmaPRP1 [UniProtKB:P08012] and GmaPRP2 [UniProtKB:P13993]; Medicago truncatula MtrPRP1 [UniProtKB:Q9FEW3]; Oryza sativa OsaPRP1 [UniProtKB:Q94H12], OsaPRP2 [UniProtKB:Q7GBX3], OsaPRP3 [UniProtKB:Q94H10] and OsaPRP4 [UniProtKB:Q7XGS2]; Solanum licopersicum SliPRP1 [UniProtKB:Q00451]; Zea mays ZmaPRP1 [UniProtKB:Q41848] and ZmaPRP2 [UniProtKB:Q9SBX4]. HRGPs accession numbers (alphabetically): A. thaliana AthHRGP1 [UniProtKB:Q38913] and AthHRGP3 [UniProtKB:Q9FS16]; B. napus BnaHRGP1 [UniProtKB:Q8LK15]; Catharanthus roseus CroHRGP1 [UniProtKB:Q39599] and CroHRGP2 [UniProtKB:Q39600]; G. max GmaHRGP3 [UniProtKB:Q39835]; Nicotiana Plumbaginifolia NplHRGP1 [UniProtKB:Q40402]; Nicotiana sylvestris NsyHRGP1 [UniProtKB:Q9FSG0]; Nicotiana tabacum NtaHRGP1 [UniProtKB:Q40503] and NtaHRGP1 [UniProtKB:Q06802]; O. sativa OsaHRGP1 [UniProtKB:Q40692]; Phaseolus vulgaris PvuHRGP1 [UniProtKB:Q09083]; Pisum sativum PsaHRGP1 [UniProtKB:Q9M6R7]; Prunus dulcis PduHRGP1 [UniProtKB:Q40768]; S. licopersicum SliHRGP1 [UniProtKB:Q09082] and SliHRGP2 [UniProtKB:Q09084]; Solanum tuberosum StuHRGP1 [UniProtKB:Q06446]; Vigna unguiculata VunHRGP1 [UniProtKB:Q41707]. AGPs accession numbers (alphabetically): A. thaliana AthAGP1 [UniProtKB:Q8LCN5], AthAGP2 [UniProtKB:Q9SJY7], AthAGP3 [UniProtKB:Q9ZT17], AthAGP4 [UniProtKB:Q9ZT16], AthAGP5 [UniProtKB:Q8LCE4], AthAGP6 [UniProtKB:Q9LY91], AthAGP7 [UniProtKB:8LG54], AthAGP9 [UniProtKB:Q9C5S0], AthAGP10 [UniProtKB:Q9M0S4], AthAGP12 [UniProtKB:Q9LJD9], AthAGP13 [UniProtKB:Q9STQ3], AthAGP14 [UniProtKB:Q9LVC0], AthAGP15 [UniProtKB:Q9LYF6], AthAGP18 [UniProtKB:Q9FPR2], AthAGP19 [UniProtKB:Q9S740], AthAGP21 [UniProtKB:Q9C8A4], AthFLA1 [UniProtKB:Q9FM65], AthFLA2 [UniProtKB:Q9SV13], AthFLA4 [UniProtKB:Q9SNC3], AthFLA8 [UniProtKB:O22126] and AthFLA10 [UniProtKB:Q9LZX4]; M. truncatula MtrAGP1 [UniProtKB:G7K3Y3] and MtrAGP2 [UniProtKB:G7JV60].

Similar articles

Cited by

References

    1. Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D. A census of protein repeats. J Mol Biol. 1999;293:151–160. doi: 10.1006/jmbi.1999.3136. - DOI - PubMed
    1. Katti MV, Sami-Subbu R, Ranjekar PK, Gupta VS. Amino acid repeat patterns in protein sequences: their diversity and structural-functional implications. Prot Sci. 2000;9:1203–1209. doi: 10.1110/ps.9.6.1203. - DOI - PMC - PubMed
    1. Groves MR, Barford D. Topological characteristics of helical repeat proteins. Curr Opin Struct Biol. 1999;9:383–389. doi: 10.1016/S0959-440X(99)80052-9. - DOI - PubMed
    1. Showalter AM. Structure and function of plant cell wall proteins. Plant Cell. 1993;5:9–23. - PMC - PubMed
    1. Cassab GL. Plant cell wall proteins. Annu Rev Plant Physiol Plant Mol Biol. 1998;49:281–309. doi: 10.1146/annurev.arplant.49.1.281. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources