Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May;31(5):1132-48.
doi: 10.1093/molbev/msu062. Epub 2014 Feb 3.

Deep conservation of human protein tandem repeats within the eukaryotes

Affiliations

Deep conservation of human protein tandem repeats within the eukaryotes

Elke Schaper et al. Mol Biol Evol. 2014 May.

Abstract

Tandem repeats (TRs) are a major element of protein sequences in all domains of life. They are particularly abundant in mammals, where by conservative estimates one in three proteins contain a TR. High generation-scale duplication and deletion rates were reported for nucleic TR units. However, it is not known whether protein TR units can also be frequently lost or gained providing a source of variation for rapid adaptation of protein function, or alternatively, tend to have conserved TR unit configurations over long evolutionary times. To obtain a systematic picture, we performed a proteome-wide analysis of the mode of evolution for human protein TRs. For this purpose, we propose a novel method for the detection of orthologous TRs based on circular profile hidden Markov models. For all detected TRs, we reconstructed bispecies TR unit phylogenies across 61 eukaryotes ranging from human to yeast. Moreover, we performed additional analyses to correlate functional and structural annotations of human TRs with their mode of evolution. Surprisingly, we find that the vast majority of human TRs are ancient, with TR unit number and order preserved intact since distant speciation events. For example, ≥ 61% of all human TRs have been strongly conserved at least since the root of all mammals, approximately 300 Ma. Further, we find no human protein TR that shows evidence for strong recent duplications and deletions. The results are in contrast to the high generation-scale mutability of nucleic TRs. Presumably, most protein TRs fold into stable and conserved structures that are indispensable for the function of the TR-containing protein. All of our data and results are available for download from http://www.atgc-montpellier.fr/TRE.

Keywords: conservation; phylogenetic analysis; protein evolution; tandem repeats.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.
Fig. 1.
Tandem repeat unit evolution. (A) A scenario of TR unit evolution for species A and B represented by TR unit phylogeny, where nodes mark either speciation events or TR unit duplications. Abandoned edges mark a TR unit loss. The ancestral TR region is created through duplications of an ancestral subsequence, that is, the unique TR unit at the root of the phylogeny (black). Immediately following the speciation event, exact copies of the TR reside in orthologous proteins in both species (pink and blue), even after some point mutations in TR units the TR still is perfectly conserved, as long as the amino acid identity remains high. The formula imageth TR unit in A is the closest to the formula imageth unit in B. Subsequent TR unit duplications and losses diminish the conservation of the TR between species A and B. Without point mutations, the more TR unit losses or gains occur, the more TR units begin to cluster by sequence similarity within the same species. (B) The bi-species TR unit phylogeny of a perfectly conserved WD repeat (PF00400) in the human TORC subunit ENSP00000457870 and its yeast ortholog YNL006W. The TR units are indexed by their order along the protein sequence. The depicted phylogeny allows to reconstruct ancient TR unit duplications leading to the currently observed TR regions in fungi and animals before their divergence ∼0.6–1.6 byr ago (Taylor and Berbee 2007). (C) The bi-species TR unit phylogeny of a perfectly separated TR in the human NAC-alpha domain-containing protein 1 ENSP00000420477 and its mouse ortholog ENSMUSP00000049490. The ancestral protein presumably contained a TR region with multiple repeat units. Yet, the TR region cannot be reconstructed due to the fast succession of TR unit gains/losses in at least one of the lineages.
F<sc>ig</sc>. 2.
Fig. 2.
Circular TR sequence profile HMM. Shown is an example of a profile HMM describing a TR unit with three consensus positions, where basic match states (M), deletion states (D), insertions states (I), and transitions correspond to the HMMER core model (Eddy 2008). Repetitions of the motif in tandem are modeled by introducing transitions from the final consensus position to the first consensus position. The transitions probabilities for the final match state (pink), deletion state (red), and insertion state (orange) are taken as the normalized means of the corresponding transitions probabilities in all other consensus positions. The probability to enter the TR is equal for all match states (blue). Similarly, for all match states it is assumed to be equally likely to stay in the TR or leave the TR (blue).
F<sc>ig</sc>. 3.
Fig. 3.
Conservation and separation of 3,091 human protein Tandem repeats (TRs) across the eukaryotes. (A) The y-axis shows the number of human TRs conserved at least since the root of different reference clades denoted on the x-axis and ordered by their generality. We established conservation in a cross-comparative analysis of human TRs with their orthologous TRs in all species outside the clade. Denoted in blue are the four different measures of sequence conservation, where darker color marks a higher degree of conservation. To establish conservation of a human TR at least to the root of a clade, the human TR was compared with orthologous TRs in all outgroup species outside the clade. For example, 1,669 human TRs in our data set are perfectly conserved compared with one or more TRs in orthologs from any of the 21 nonmammalian species, providing evidence that these human TRs have been conserved at least since the root of all mammals (blue continuous curve). From more general to more specific clades, the number of human TRs with evidence for conservation at least to the root of the clade is cumulatively increasing. (B) The y-axis shows the number of human TRs separated compared with at least one other species within the clade. Denoted in red are the three measures of TR separation, where darker color marks the higher degree of separation. For example, 146 human TRs in our data set are perfectly separated compared with one or more TRs in orthologs from any of the other 39 mammalian species (red continuous curve). As the number of species in the clade wise comparison increases from Hominines to broader clades, the number of separated TRs is growing cumulatively.
F<sc>ig</sc>. 4.
Fig. 4.
TR types and GO enrichment of human proteins with conserved and separated TRs. (A) TRs that have been strongly conserved at least since the root of mammals; (B) TRs that show strong separation compared with an orthologous TR in at least one other mammal. The first summary bar in each plot shows the frequency of the different TR types: there are 1,896 conserved TRs, with the WD40 TR being the most frequent, and 236 separated TRs, with the Zinc finger TR being the most frequent. All TR types based on de novo TR detections were binned into one category (denoted with dark gray), although they may describe very diverse motifs. Likewise, TR types based on PFAM annotations with low frequencies (<30 TRs for the set of strongly conserved TRs, and <3 TRs for the set of strongly separated TRs) were binned together (denoted with light gray). The thinner bars below the summary bars show representative enriched GO terms ordered by their frequency. Each bar corresponding to a GO term depicts the distribution of different TR types in proteins annotated with this GO term. GO terms are grouped by their respective ontology: Biological Process (BP), Molecular Function (MF), or Cellular Component (CC).
F<sc>ig</sc>. 5.
Fig. 5.
Characteristics of separated versus conserved tandem repeats. Shown are frequency distributions of TR characteristics (see Materials and Methods) for strongly conserved (blue) and strongly separated (red) human TRs, with the mammalian clade as the reference. For each TR type defined by distinct circular HMMs, the mean value was calculated for each characteristic. For example, the mean number of zinc finger TR units was 7 for conserved TRs and 13 for separated TRs, each constituting one data point summarizing a large family of zinc fingers. The total data set comprises average values for 235 TR types with strongly conserved TRs and 86 TR types with strongly separated TRs.

Similar articles

Cited by

References

    1. Abraham A-L, Pothier J, Rocha EPC. Alternative to homo-oligomerisation: the creation of local symmetry in proteins by internal amplification. J Mol Biol. 2009;394:522–534. - PubMed
    1. Angst BD, Marcozzi C, Magee AI. The cadherin superfamily: diversity in form and function. J Cell Sci. 2001;114:629–641. - PubMed
    1. Auton A, Fledel-Alon A, Pfeifer S, Venn O, Ségurel L, Street T, Leffler EM, Bowden R, Aneas I, Broxholme J, et al. A fine-scale chimpanzee genetic map from population sequencing. Science. 2012;336:193–198. - PMC - PubMed
    1. Barford D. The role of multiple sequence repeat motifs in the assembly of multi-protein complexes. In: Carrondo MA, Spadon P, editors. Macromolecular crystallography. Dordrecht (The Netherlands): Springer; 2012. pp. 43–49.
    1. Baudat F, Buard J, Grey C, Fledel-Alon A, Ober C, Przeworski M, Coop G, de Massy B. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science. 2010;327:836–840. - PMC - PubMed

Publication types

LinkOut - more resources