. 2009 Dec 10:10:593.

doi: 10.1186/1471-2164-10-593.

Expansion of tandem repeats in sea anemone Nematostella vectensis proteome: A source for gene novelty?

Guy Naamati¹, Menachem Fromer, Michal Linial

Affiliations

PMID: 20003297
PMCID: PMC2805694
DOI: 10.1186/1471-2164-10-593

Expansion of tandem repeats in sea anemone Nematostella vectensis proteome: A source for gene novelty?

Guy Naamati et al. BMC Genomics. 2009.

. 2009 Dec 10:10:593.

doi: 10.1186/1471-2164-10-593.

Authors

Guy Naamati¹, Menachem Fromer, Michal Linial

Affiliation

¹ The Hebrew University of Jerusalem, Israel. guy.naamati@mail.huji.ac.il

PMID: 20003297
PMCID: PMC2805694
DOI: 10.1186/1471-2164-10-593

Abstract

Background: The complete proteome of the starlet sea anemone, Nematostella vectensis, provides insights into gene invention dating back to the Cnidarian-Bilaterian ancestor. With the addition of the complete proteomes of Hydra magnipapillata and Monosiga brevicollis, the investigation of proteins having unique features in early metazoan life has become practical. We focused on the properties and the evolutionary trends of tandem repeat (TR) sequences in Cnidaria proteomes.

Results: We found that 11-16% of N. vectensis proteins contain tandem repeats. Most TRs cover 150 amino acid segments that are comprised of basic units of 5-20 amino acids. In total, the N. Vectensis proteome has about 3300 unique TR-units, but only a small fraction of them are shared with H. magnipapillata, M. brevicollis, or mammalian proteomes. The overall abundance of these TRs stands out relative to that of 14 proteomes representing the diversity among eukaryotes and within the metazoan world. TR-units are characterized by a unique composition of amino acids, with cysteine and histidine being over-represented. Structurally, most TR-segments are associated with coiled and disordered regions. Interestingly, 80% of the TR-segments can be read in more than one open reading frame. For over 100 of them, translation of the alternative frames would result in long proteins. Most domain families that are characterized as repeats in eukaryotes are found in the TR-proteomes from Nematostella and Hydra.

Conclusions: While most TR-proteins have originated from prediction tools and are still awaiting experimental validations, supportive evidence exists for hundreds of TR-units in Nematostella. The existence of TR-proteins in early metazoan life may have served as a robust mode for novel genes with previously overlooked structural and functional characteristics.

PubMed Disclaimer

Figures

**Figure 1**
TR-containing proteins from *N. vectensis*. Graphical representation of two TR-containing proteins: A7SW76 and A7S5V7 (UniProt). Proteins contain TR with a unit length of 49 amino acids (yellow, TR-unit). This TR-unit appears twice on A7SW76, while the other two TR-units are unique to A7SW76. These TR-units are of length 7 and 21 amino acids (blue and green), respectively. For consistency, we count the number of unique TRs in a genome (i.e. unique non-overlapping sequences that fulfill the TR definition). TR-proteins sum all proteins that contain at least one TR-segment (TR-segment, TR-seg). The minimal TR-segment is composed of at least 3 successive TR-units. (B) Statistical comparison of *N. vectensis* and *D. melanogaster* TR-proteomes.

**Figure 2**
*N. vectensis* TRs relative to representative proteomes along the metazoan evolutionary tree. (A) The fraction of TR-proteins within the proteome tested for 14 model organisms. Representative organisms include plant, insects, worm, sea squirt, frog, and more. The relative fraction of TR-proteins in *N. vectensis* is nearly double that of fly and almost 5 fold higher than the plant representative. (B) The usage of the unique TRs in the proteins. In mouse, worm, fly, and human, on average, each TR is used in >2 distinct proteins. However, for *N. vectensis*, this ratio is close to one. (C) The length of the TR-segment relative to the length of the protein that contains it (denoted as coverage). The coverage is < 20% in most organisms excluding *N. vectensis*.

**Figure 3**
Properties of the TRs in *N. vectensis*. (A) The distribution of the TR-unit length for all 3212 unique TR sequences from *N. vectensis*. A unit length of 10-12 amino acids is most frequent. The tail of the length distribution (of repeats longer than >60 amino acids) is not shown. (B) The relationship between TR-unit length, total length of the TR-segment (in pink) and the average number of repeats (copy number, blue). As the length of the TR-unit increases, the copy number of the repeats decreases. (C) Comparison of the variation rate within TR-segments from *N. vectensis* (blue) and human (pink). Variation rate is measured up to a 20% cumulative difference in the sequence of the TR-segment relative to the consensus TR-unit. For details, see Additional file 1.

**Figure 4**
**Amino acid composition in TR-segments**. (A) Composition of amino acids in TR-segments relative to non-TR proteins from *N. vectensis*. The over-represented and under-represented amino acids are shown. (B) The TR-proteins from *N. vectensis* were compared to human TR-proteins. Amino acids were colored according to the partition of the disorder propensity [52]. Disorder-promoting residues (A, R, S, Q, E, G, K, P) are colored red, order-promoting residues (N, C, I, L, F, W, Y, V) are colored blue, and disorder-order neutral residues (D, H, M, T) are colored gray.

**Figure 5**
**Multiple valid ARFs in TR-segments**. (A) Frequency of valid ARFs (i.e., an ORF without stop codons) for all 6 possible reading frames for 4437 TR-repeated segments ORFs from JGI *N. vectensis* proteome. The average frequency of valid ARFs in any of the alternative frames is 0.375. The frequencies of all reading frames were normalized so that the original frame (ORF +1, 100%). (B) For each of the 4437 TR-proteins, number of valid ORFs (including the original frame). For example, there are 800 TR-proteins with 4 valid ORFs (including ORF +1). (C) Over-represented and under-represented amino acids for all three ORFs (ORF +1 indicates the original ORF). (D) A scheme for a repeat segment in which each reading frame translates to a nearly identical protein (differing only at the beginning and end of the sequence). In this example, the reverse complement frames are also valid (i.e., do not encounter a stop codon) throughout the sequence. The arrows indicate the directionality of the transcript.

**Figure 6**
**Evolutionary conserved *N. vectensis* TR-units**. (A) A schematic phylogenetic tree showing the main branches in metazoan origin. Proteomes that are compared are indicated in blue. (B) *N. vectensis* shares 160 TR-units with human and 112 TR-units with mouse. Mouse and human share 343 TRs, among which 64 are shared by all three organisms. (C) *Hydra magnipapillata* TR-proteins were compared to *N. vectensis*. 83 TR-units are shared between these two organisms. *N. vectensis and M. brevicollis* share 74 TR-proteins. 41 proteins are shared among all three organisms. For the list of shared proteins, see Additional file 3.

**Figure 7**
**Pfam repeated domains in *N. vectensis* proteome**. Pfam repeat domains based on *N. vectensis* InterPro annotations. Only Pfam entries with >20 proteins are listed. The histogram indicates the log-ratio of copy number for a particular TR-unit in human and *N. vectensis*. Significant differences are colored. For details and for a full list of all Pfam repeat entries in *N. vectensis*, see Additional file 5.

See this image and copyright information in PMC

Cited by

Short toxin-like proteins abound in Cnidaria genomes.
Tirosh Y, Linial I, Askenazi M, Linial M. Tirosh Y, et al. Toxins (Basel). 2012 Nov 16;4(11):1367-84. doi: 10.3390/toxins4111367. Toxins (Basel). 2012. PMID: 23202321 Free PMC article.
A haplotype resolved chromosomal level avocado genome allows analysis of novel avocado genes.
Nath O, Fletcher SJ, Hayward A, Shaw LM, Masouleh AK, Furtado A, Henry RJ, Mitter N. Nath O, et al. Hortic Res. 2022 Mar 30;9:uhac157. doi: 10.1093/hr/uhac157. eCollection 2022. Hortic Res. 2022. PMID: 36204209 Free PMC article.
Genetic diversity of the allodeterminant alr2 in Hydractinia symbiolongicarpus.
Rosengarten RD, Moreno MA, Lakkis FG, Buss LW, Dellaporta SL. Rosengarten RD, et al. Mol Biol Evol. 2011 Feb;28(2):933-47. doi: 10.1093/molbev/msq282. Epub 2010 Oct 21. Mol Biol Evol. 2011. PMID: 20966116 Free PMC article.

References

1. Makalowski W, Mitchell GA, Labuda D. Alu sequences in the coding regions of mRNA: a source of protein variability. Trends Genet. 1994;10:188–193. doi: 10.1016/0168-9525(94)90254-2. - DOI - PubMed
1. Zhang L, Yuan D, Yu S, Li Z, Cao Y, Miao Z, Qian H, Tang K. Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana. Bioinformatics. 2004;20:1081–1086. doi: 10.1093/bioinformatics/bth043. - DOI - PubMed
1. Kashi Y, King D, Soller M. Simple sequence repeats as a source of quantitative genetic variation. Trends Genet. 1997;13:74–78. doi: 10.1016/S0168-9525(97)01008-1. - DOI - PubMed
1. Alba MM, Tompa P, Veitia RA. Amino acid repeats and the structure and evolution of proteins. Genome Dyn. 2007;3:119–130. full_text. - PubMed
1. Ackermann M, Chao L. DNA sequences shaped by selection for stability. PLoS Genet. 2006;2:e22. doi: 10.1371/journal.pgen.0020022. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Expansion of tandem repeats in sea anemone Nematostella vectensis proteome: A source for gene novelty?

Affiliation

Expansion of tandem repeats in sea anemone Nematostella vectensis proteome: A source for gene novelty?

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources