Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec 10:10:593.
doi: 10.1186/1471-2164-10-593.

Expansion of tandem repeats in sea anemone Nematostella vectensis proteome: A source for gene novelty?

Affiliations

Expansion of tandem repeats in sea anemone Nematostella vectensis proteome: A source for gene novelty?

Guy Naamati et al. BMC Genomics. .

Abstract

Background: The complete proteome of the starlet sea anemone, Nematostella vectensis, provides insights into gene invention dating back to the Cnidarian-Bilaterian ancestor. With the addition of the complete proteomes of Hydra magnipapillata and Monosiga brevicollis, the investigation of proteins having unique features in early metazoan life has become practical. We focused on the properties and the evolutionary trends of tandem repeat (TR) sequences in Cnidaria proteomes.

Results: We found that 11-16% of N. vectensis proteins contain tandem repeats. Most TRs cover 150 amino acid segments that are comprised of basic units of 5-20 amino acids. In total, the N. Vectensis proteome has about 3300 unique TR-units, but only a small fraction of them are shared with H. magnipapillata, M. brevicollis, or mammalian proteomes. The overall abundance of these TRs stands out relative to that of 14 proteomes representing the diversity among eukaryotes and within the metazoan world. TR-units are characterized by a unique composition of amino acids, with cysteine and histidine being over-represented. Structurally, most TR-segments are associated with coiled and disordered regions. Interestingly, 80% of the TR-segments can be read in more than one open reading frame. For over 100 of them, translation of the alternative frames would result in long proteins. Most domain families that are characterized as repeats in eukaryotes are found in the TR-proteomes from Nematostella and Hydra.

Conclusions: While most TR-proteins have originated from prediction tools and are still awaiting experimental validations, supportive evidence exists for hundreds of TR-units in Nematostella. The existence of TR-proteins in early metazoan life may have served as a robust mode for novel genes with previously overlooked structural and functional characteristics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
TR-containing proteins from N. vectensis. Graphical representation of two TR-containing proteins: A7SW76 and A7S5V7 (UniProt). Proteins contain TR with a unit length of 49 amino acids (yellow, TR-unit). This TR-unit appears twice on A7SW76, while the other two TR-units are unique to A7SW76. These TR-units are of length 7 and 21 amino acids (blue and green), respectively. For consistency, we count the number of unique TRs in a genome (i.e. unique non-overlapping sequences that fulfill the TR definition). TR-proteins sum all proteins that contain at least one TR-segment (TR-segment, TR-seg). The minimal TR-segment is composed of at least 3 successive TR-units. (B) Statistical comparison of N. vectensis and D. melanogaster TR-proteomes.
Figure 2
Figure 2
N. vectensis TRs relative to representative proteomes along the metazoan evolutionary tree. (A) The fraction of TR-proteins within the proteome tested for 14 model organisms. Representative organisms include plant, insects, worm, sea squirt, frog, and more. The relative fraction of TR-proteins in N. vectensis is nearly double that of fly and almost 5 fold higher than the plant representative. (B) The usage of the unique TRs in the proteins. In mouse, worm, fly, and human, on average, each TR is used in >2 distinct proteins. However, for N. vectensis, this ratio is close to one. (C) The length of the TR-segment relative to the length of the protein that contains it (denoted as coverage). The coverage is < 20% in most organisms excluding N. vectensis.
Figure 3
Figure 3
Properties of the TRs in N. vectensis. (A) The distribution of the TR-unit length for all 3212 unique TR sequences from N. vectensis. A unit length of 10-12 amino acids is most frequent. The tail of the length distribution (of repeats longer than >60 amino acids) is not shown. (B) The relationship between TR-unit length, total length of the TR-segment (in pink) and the average number of repeats (copy number, blue). As the length of the TR-unit increases, the copy number of the repeats decreases. (C) Comparison of the variation rate within TR-segments from N. vectensis (blue) and human (pink). Variation rate is measured up to a 20% cumulative difference in the sequence of the TR-segment relative to the consensus TR-unit. For details, see Additional file 1.
Figure 4
Figure 4
Amino acid composition in TR-segments. (A) Composition of amino acids in TR-segments relative to non-TR proteins from N. vectensis. The over-represented and under-represented amino acids are shown. (B) The TR-proteins from N. vectensis were compared to human TR-proteins. Amino acids were colored according to the partition of the disorder propensity [52]. Disorder-promoting residues (A, R, S, Q, E, G, K, P) are colored red, order-promoting residues (N, C, I, L, F, W, Y, V) are colored blue, and disorder-order neutral residues (D, H, M, T) are colored gray.
Figure 5
Figure 5
Multiple valid ARFs in TR-segments. (A) Frequency of valid ARFs (i.e., an ORF without stop codons) for all 6 possible reading frames for 4437 TR-repeated segments ORFs from JGI N. vectensis proteome. The average frequency of valid ARFs in any of the alternative frames is 0.375. The frequencies of all reading frames were normalized so that the original frame (ORF +1, 100%). (B) For each of the 4437 TR-proteins, number of valid ORFs (including the original frame). For example, there are 800 TR-proteins with 4 valid ORFs (including ORF +1). (C) Over-represented and under-represented amino acids for all three ORFs (ORF +1 indicates the original ORF). (D) A scheme for a repeat segment in which each reading frame translates to a nearly identical protein (differing only at the beginning and end of the sequence). In this example, the reverse complement frames are also valid (i.e., do not encounter a stop codon) throughout the sequence. The arrows indicate the directionality of the transcript.
Figure 6
Figure 6
Evolutionary conserved N. vectensis TR-units. (A) A schematic phylogenetic tree showing the main branches in metazoan origin. Proteomes that are compared are indicated in blue. (B) N. vectensis shares 160 TR-units with human and 112 TR-units with mouse. Mouse and human share 343 TRs, among which 64 are shared by all three organisms. (C) Hydra magnipapillata TR-proteins were compared to N. vectensis. 83 TR-units are shared between these two organisms. N. vectensis and M. brevicollis share 74 TR-proteins. 41 proteins are shared among all three organisms. For the list of shared proteins, see Additional file 3.
Figure 7
Figure 7
Pfam repeated domains in N. vectensis proteome. Pfam repeat domains based on N. vectensis InterPro annotations. Only Pfam entries with >20 proteins are listed. The histogram indicates the log-ratio of copy number for a particular TR-unit in human and N. vectensis. Significant differences are colored. For details and for a full list of all Pfam repeat entries in N. vectensis, see Additional file 5.

Similar articles

Cited by

References

    1. Makalowski W, Mitchell GA, Labuda D. Alu sequences in the coding regions of mRNA: a source of protein variability. Trends Genet. 1994;10:188–193. doi: 10.1016/0168-9525(94)90254-2. - DOI - PubMed
    1. Zhang L, Yuan D, Yu S, Li Z, Cao Y, Miao Z, Qian H, Tang K. Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana. Bioinformatics. 2004;20:1081–1086. doi: 10.1093/bioinformatics/bth043. - DOI - PubMed
    1. Kashi Y, King D, Soller M. Simple sequence repeats as a source of quantitative genetic variation. Trends Genet. 1997;13:74–78. doi: 10.1016/S0168-9525(97)01008-1. - DOI - PubMed
    1. Alba MM, Tompa P, Veitia RA. Amino acid repeats and the structure and evolution of proteins. Genome Dyn. 2007;3:119–130. full_text. - PubMed
    1. Ackermann M, Chao L. DNA sequences shaped by selection for stability. PLoS Genet. 2006;2:e22. doi: 10.1371/journal.pgen.0020022. - DOI - PMC - PubMed

Publication types

LinkOut - more resources