Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Dec 20:5:20.
doi: 10.1186/1477-5956-5-20.

Surface antigens and potential virulence factors from parasites detected by comparative genomics of perfect amino acid repeats

Affiliations

Surface antigens and potential virulence factors from parasites detected by comparative genomics of perfect amino acid repeats

Niklaus Fankhauser et al. Proteome Sci. .

Abstract

Background: Many parasitic organisms, eukaryotes as well as bacteria, possess surface antigens with amino acid repeats. Making up the interface between host and pathogen such repetitive proteins may be virulence factors involved in immune evasion or cytoadherence. They find immunological applications in serodiagnostics and vaccine development. Here we use proteins which contain perfect repeats as a basis for comparative genomics between parasitic and free-living organisms.

Results: We have developed Reptile http://reptile.unibe.ch, a program for proteome-wide probabilistic description of perfect repeats in proteins. Parasite proteomes exhibited a large variance regarding the proportion of repeat-containing proteins. Interestingly, there was a good correlation between the percentage of highly repetitive proteins and mean protein length in parasite proteomes, but not at all in the proteomes of free-living eukaryotes. Reptile combined with programs for the prediction of transmembrane domains and GPI-anchoring resulted in an effective tool for in silico identification of potential surface antigens and virulence factors from parasites.

Conclusion: Systemic surveys for perfect amino acid repeats allowed basic comparisons between free-living and parasitic organisms that were directly applicable to predict proteins of serological and parasitological importance. An on-line tool is available at http://genomics.unibe.ch/dora.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparative genomics of repeat-containing proteins. Double logarithmic plot of the percentage of highly repetitive (P < 10-10) proteins vs. mean protein length of eukaryotic proteomes. Ag, A. gambiae; At, A. thaliana; Br, B. rerio; Ce, C. elegans; Dd, D. discoideum; Dm, D. melanogaster; Hs, H. sapiens; Kl, K. lactis; Mm, M. musculus; Rn, R. norvegicus; Sc, S. cerevisiae; Sp, S. pombe; Yl, Y. lipolytica; Ch, C. hominis; Cn, C. neoformans; Ec, E. cuniculi; Eh, E. histolytica; Gd, G. duodenalis; Lm, L. major; Pf, P. falciparum; Ta, T. annulata; Tb, T. brucei; Tp, T. parva; rS, Spearman coefficient.
Figure 2
Figure 2
Amino acid composition of the repeats. For each amino acid, the frequency in the repeats of P < 10-10 is plotted vs. its frequency in the remainder of the proteome (rS, Spearman coefficient). Data are pooled for bacteria (n = 193) and eukaryotes (n = 49). The small diamonds at 0.05 mark the expected frequency for random distribution, the diagonal represents equal frequency in the repeats as in the remainder of the respective proteome. Complete datatables including standard deviation are provided as a supplementary file [Additional file 1].
Figure 3
Figure 3
Potential N-glycosylation sites in the repeats. The percentage of asparagines that are in glycosylation consensus (Asn-not Pro-Ser/Thr) is plotted for repeats of P < 10-10 and for the remainders of the respective proteomes. Bars indicate the median. The organism with 30% of asparagines in the repeats in N-glycosylation consensus is T. brucei.
Figure 4
Figure 4
Flowchart to Dora, database of repetitive antigens. Reptile, Phobius [20], and GPI-SOM [43] are integrated into an automated pipeline for the classification of proteins (top). The data are stored in a database that is accessible on-line [44] via the depicted interface (bottom). This allows user-defined Boolean queries for repeat-containing surface proteins.

Similar articles

Cited by

References

    1. Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D. A census of protein repeats. J Mol Biol. 1999;293:151–160. doi: 10.1006/jmbi.1999.3136. - DOI - PubMed
    1. Andrade MA, Ponting CP, Gibson TJ, Bork P. Homology-based method for identification of protein repeats using statistical significance estimates. J Mol Biol. 2000;298:521–537. doi: 10.1006/jmbi.2000.3684. - DOI - PubMed
    1. Andrade MA, Perez-Iratxeta C, Ponting CP. Protein repeats: structures, functions, and evolution. J Struct Biol. 2001;134:117–131. doi: 10.1006/jsbi.2001.4392. - DOI - PubMed
    1. Heger A, Holm L. Rapid automatic detection and alignment of repeats in protein sequences. Proteins. 2000;41:224–237. doi: 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-Z. - DOI - PubMed
    1. Radar http://www.ebi.ac.uk/Radar

LinkOut - more resources