Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug;10(4):217-25.
doi: 10.1016/j.gpb.2012.04.001. Epub 2012 Aug 4.

Homepeptide repeats: implications for protein structure, function and evolution

Affiliations

Homepeptide repeats: implications for protein structure, function and evolution

Muthukumarasamy Uthayakumar et al. Genomics Proteomics Bioinformatics. 2012 Aug.

Abstract

Analysis of protein sequences from Mycobacterium tuberculosis H37Rv (Mtb H37Rv) was performed to identify homopeptide repeat-containing proteins (HRCPs). Functional annotation of the HRCPs showed that they are preferentially involved in cellular metabolism. Furthermore, these homopeptide repeats might play some specific roles in protein-protein interaction. Repeat length differences among Bacteria, Archaea and Eukaryotes were calculated in order to identify the conservation of the repeats in these divergent kingdoms. From the results, it was evident that these repeats have a higher degree of conservation in Bacteria and Archaea than in Eukaryotes. In addition, there seems to be a direct correlation between the repeat length difference and the degree of divergence between the species. Our study supports the hypothesis that the presence of homopeptide repeats influences the rate of evolution of the protein sequences in which they are embedded. Thus, homopeptide repeat may have structural, functional and evolutionary implications on proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Total number and percentage of homopeptide repeats in Mtb H37Rv A pie chart describing the total number and the percentage of the homopeptide repeats. The residues A (blue), G (green), R (red) and P (violet) occur most often among the homopeptide repeats.
Figure 2
Figure 2
Functional annotation of HRCPs in Mtb H37Rv The homopeptide repeats were analyzed and categorized into eight classes. The number of repeat containing proteins has been provided on the X-axis and their corresponding functional class is provided on the Y-axis. The highest number of repeat containing proteins is the metabolic and hypothetical proteins.
Figure 3
Figure 3
The poly-histidine tract occurs in the linker regions of protein kinase PknD The poly-H track (residues 265–270) was revealed in the linker regions of Ser/Thr protein kinase PknD (PDB ID: 1RWI). Poly-H was shown in black.
Figure 4
Figure 4
Comparison of amino acid usage in homopeptide repeats and proteome in Mtb H37Rv The amino acid usage (%) in both homopeptide repeats (homerepeats) and in the protein sequences (in proteome) was provided here. The amino acid residues A, G, P and R exhibit peak usage in both homopeptide repeats and the total protein sequences.
Figure 5
Figure 5
Repeat size differences between proteins from Mtb H37Rv, E. coli, S. acidocaldarius and H. sapiens The repeat size difference between Mtb H37Rv and E. coli is low, compared to that between Mtb H37Rv and S. acidocaldarius DSM 639 or H. sapiens.
Figure 6
Figure 6
Sequence alignment of orthologous mmpS3 protein from seven Mycobacterium The sequence highlighted within the box represents the homopeptide repeat block. The start and the end positions of the protein sequences was provided before and after the sequences.

Similar articles

Cited by

References

    1. Depledge D.P., Dalby A.R. COPASAAR – a database for proteomic analysis of single amino acid repeats. BMC Bioinformatics. 2005;6:196. - PMC - PubMed
    1. Depledge D.P., Lower R.P.J., Smith D.F. RepSeq – a database of amino acid repeats present in lower eukaryotic pathogens. BMC Bioinformatics. 2007;8:112. - PMC - PubMed
    1. Cocquet J., De Baere E., Caburet S., Veitia R.A. Compositional biases and poly-A runs in humans. Genetics. 2003;165:1613–1617. - PMC - PubMed
    1. Caburet S., Vaiman D., Veitia R.A. A genomic basis for the evolution of vertebrate transcription factors containing amino acid runs. Genetics. 2004;167:1813–1820. - PMC - PubMed
    1. Nakachi Y., Hayakawa T., Oota H., Sumiyama K., Wang L., Ueda S. Nucleotide compositional constraints on genomes generate alanine-, glcyine-, and proline-rich structures in transcription factors. Mol Biol Evol. 1997;14:1042–1049. - PubMed

Publication types

MeSH terms

LinkOut - more resources