Expansion of protein domain repeats

doi:10.1371/journal.pcbi.0020114

. 2006 Aug 25;2(8):e114.

doi: 10.1371/journal.pcbi.0020114. Epub 2006 Jul 14.

Expansion of protein domain repeats

Asa K Björklund¹, Diana Ekman, Arne Elofsson

Affiliations

PMID: 16933986
PMCID: PMC1553488
DOI: 10.1371/journal.pcbi.0020114

Expansion of protein domain repeats

Asa K Björklund et al. PLoS Comput Biol. 2006.

. 2006 Aug 25;2(8):e114.

doi: 10.1371/journal.pcbi.0020114. Epub 2006 Jul 14.

Authors

Asa K Björklund¹, Diana Ekman, Arne Elofsson

Affiliation

¹ Stockholm Bioinformatics Center, Center for Biomembrane Research, Stockholm University, Stockholm, Sweden.

PMID: 16933986
PMCID: PMC1553488
DOI: 10.1371/journal.pcbi.0020114

Abstract

Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein-protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

**Figure 1. Domain Assignments and Exon Structure for the Chicken Nebulin Protein ENSGALP00000020382**
The initial domain assignments (D) using an E-value cutoff at 0.1 detected 51 nebulin domains. With a less strict cutoff, we were able to assign 15 additional domains. Still, there are four gaps (regions with no domain assignment), which are likely to contain domains that cannot be detected with the current HMMs. Below the domain assignments, the exon structure (E) is seen, with a box for each of the 44 exons, where it is evident that a block of four exons (a long one in black, two short ones in white, and one intermediate size in gray) correspond to a block of seven domains even if the exon borders all are found within the domains.

**Figure 2. Fraction of Proteins That Contain a Domain Repeat in Archaea, Bacteria, Yeast, and the Eight Multicellular Eukaryotes (Sorted by Number of Proteins)**
The different patterns indicate the length of the repeat, i.e., whether it contains 2, 3, 4 domains, etc. The eukaryotic species are labeled with the abbreviations of species names such as Hsa for Homo sapiens followed by the number of proteins in each proteome. For a list of all species in this study, see Materials and Methods.

**Figure 3. Overview of the Methodology**
(A) In a protein with five domains, a unit of three N-terminal domains has been duplicated in tandem. (B) To identify this evolutionary event, alignment of all domain pairs in the protein is performed. (C) The alignment scores between the domains displayed in a matrix with increasing color intensity for higher scores. The diagonal shows alignment scores for each domain to itself, while square 1,2 gives the score between the first and the second domain. A pattern where domain pairs 3–6, 4–7, and 5–8 have the highest alignment scores can be seen. (D) From the alignment scores, an ACV is calculated as the mean alignment score at each distance normalized around zero. The distance between the domains is defined as one for neighbouring domains, while domain pairs with one domain between them have distance two, etc. In this example a peak at distance three can be seen. Hence, we assume that this protein has evolved through the duplication of three domains.

**Figure 4. Pattern of Internal Domain Duplications in Two Human Proteins, ENSP00000319007 and ENSP00000303696, both with C2H2 Zinc Finger Repeats**
(A) ENSP00000319007. (B) ENSP00000303696. The intensity of the squares reflects the alignment score with darker color for higher scores. The numbers at each axis indicate the domains in N-to-C terminal orientation within the repeat. In these two examples, patterns of duplication of six domains (A) and two domains (B) can be seen.

**Figure 5. Pattern of Internal Domain Duplications in the Chicken Protein ENSGALP00000020382, with 66 Repeating Nebulin Domains (Pfam)**
(A) The intensity of the squares is related to alignment scores, and the numbers on both axes indicate the domains in N-to-C terminal orientation. As there were gaps in the repeat sequence (Figure 1), these were introduced as domains at positions 6, 18, 25, and 32. (B) ACV calculated from the alignment scores in (A) with the average similarity to domains at distance 1, 2, 3, etc. The ACV are normalized around zero, hence the dotted line at zero is the mean score between all domains in the protein. The ACV was calculated before introducing the gaps as domains (dashed line) and after (solid line). When the regions with no domain assignments were regarded as domains, the pattern of seven repeating units became much clearer, indicating that the gaps are also domains.

**Figure 6. ACVs for Proteins with Repeats of Eight Different Domain Families**
Solid line shows ACVs for proteins with repeats of eight different domain families. In the bottom right diagram, the ACV for all proteins with repeats is displayed. The ACV for each family was normalized around zero, hence the dashed line at zero is the mean bit score between all domains in the family. The p-value for each datapoint was calculated from random shuffling of domains, and peaks with p-values below 10⁻⁵ are indicated with an asterisk. The dotted line illustrates the fraction of repeats of the domain family with each repeat length, i.e., nonrepeated proteins have length one. The number of proteins/domains that goes into each figure can be found in Materials and Methods. Data for the remaining domain families can be found in Figure S2.

**Figure 7. Hierarchical Clustering of the ACVs from Each Protein**
(A) Dendrogram of the 20 clusters. Each cluster is indicated by a cluster number followed by the number of proteins in the cluster. (B) The average ACV for each cluster with red color for values below the average and green for values above. (C) Distribution of the ten largest domain families, as well as nebulin, in the different clusters. The expected number of proteins from a domain family in each cluster was calculated using random shuffling, and Z-scores for overrepresentation (green) and underrepresentation (red) in the cluster were calculated. The numbers after the domain family names is the number of repeats of the family.

**Figure 8. ACVs for All Proteins in Each of the 20 Clusters in Figure 7**
The number of proteins in each cluster is indicated after the cluster number.

See this image and copyright information in PMC

Cited by

Two immunoglobulin tandem proteins with a linking β-strand reveal unexpected differences in cooperativity and folding pathways.
Steward A, Chen Q, Chapman RI, Borgia MB, Rogers JM, Wojtala A, Wilmanns M, Clarke J. Steward A, et al. J Mol Biol. 2012 Feb 10;416(1):137-47. doi: 10.1016/j.jmb.2011.12.012. Epub 2011 Dec 13. J Mol Biol. 2012. PMID: 22197372 Free PMC article.
Length constraints of multi-domain proteins in metazoans.
Middleton S, Song T, Nayak S. Middleton S, et al. Bioinformation. 2010 Apr 30;4(10):441-4. doi: 10.6026/97320630004441. Bioinformation. 2010. PMID: 20975906 Free PMC article.
A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes.
Röske K, Foecking MF, Yooseph S, Glass JI, Calcutt MJ, Wise KS. Röske K, et al. BMC Genomics. 2010 Jul 13;11:430. doi: 10.1186/1471-2164-11-430. BMC Genomics. 2010. PMID: 20626840 Free PMC article.
Deep conservation of human protein tandem repeats within the eukaryotes.
Schaper E, Gascuel O, Anisimova M. Schaper E, et al. Mol Biol Evol. 2014 May;31(5):1132-48. doi: 10.1093/molbev/msu062. Epub 2014 Feb 3. Mol Biol Evol. 2014. PMID: 24497029 Free PMC article.
Living Organisms Author Their Read-Write Genomes in Evolution.
Shapiro JA. Shapiro JA. Biology (Basel). 2017 Dec 6;6(4):42. doi: 10.3390/biology6040042. Biology (Basel). 2017. PMID: 29211049 Free PMC article. Review.

See all "Cited by" articles

References

1. Apic G, Gough J, Teichmann SA. Domain combinations in archaeal, eubacterial, and eukaryotic proteomes. J Mol Biol. 2001;310:311–325. - PubMed
1. Vogel C, Teichmann SA, Pereira-Leal J. The relationship between domain duplication and recombination. J Mol Biol. 2005;346:355–365. - PubMed
1. Björklund ÅK, Ekman D, Light S, Frey-Skött J, Elofsson A. Domain rearrangements in protein evolution. J Mol Biol. 2005;353:911–923. - PubMed
1. Weiner J, III, Beaussart F, Bornberg-Bauer E. Domain deletions and substitutions in the modular protein evolution. FEBS J. 2006;273:2037–2047. - PubMed
1. Andrade M, Perez-Iratxeta C, Ponting C. Protein repeats: Structures, functions, and evolution. J Struct Biol. 2001;134:117–131. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

[1] Apic G, Gough J, Teichmann SA. Domain combinations in archaeal, eubacterial, and eukaryotic proteomes. J Mol Biol. 2001;310:311–325. - PubMed

[2] Apic G, Gough J, Teichmann SA. Domain combinations in archaeal, eubacterial, and eukaryotic proteomes. J Mol Biol. 2001;310:311–325. - PubMed

[3] Vogel C, Teichmann SA, Pereira-Leal J. The relationship between domain duplication and recombination. J Mol Biol. 2005;346:355–365. - PubMed

[4] Vogel C, Teichmann SA, Pereira-Leal J. The relationship between domain duplication and recombination. J Mol Biol. 2005;346:355–365. - PubMed

[5] Björklund ÅK, Ekman D, Light S, Frey-Skött J, Elofsson A. Domain rearrangements in protein evolution. J Mol Biol. 2005;353:911–923. - PubMed

[6] Björklund ÅK, Ekman D, Light S, Frey-Skött J, Elofsson A. Domain rearrangements in protein evolution. J Mol Biol. 2005;353:911–923. - PubMed

[7] Weiner J, III, Beaussart F, Bornberg-Bauer E. Domain deletions and substitutions in the modular protein evolution. FEBS J. 2006;273:2037–2047. - PubMed

[8] Weiner J, III, Beaussart F, Bornberg-Bauer E. Domain deletions and substitutions in the modular protein evolution. FEBS J. 2006;273:2037–2047. - PubMed

[9] Andrade M, Perez-Iratxeta C, Ponting C. Protein repeats: Structures, functions, and evolution. J Struct Biol. 2001;134:117–131. - PubMed

[10] Andrade M, Perez-Iratxeta C, Ponting C. Protein repeats: Structures, functions, and evolution. J Struct Biol. 2001;134:117–131. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Expansion of protein domain repeats

Affiliation

Expansion of protein domain repeats

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources