Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Apr 11:8:122.
doi: 10.1186/1471-2105-8-122.

RepSeq--a database of amino acid repeats present in lower eukaryotic pathogens

Affiliations

RepSeq--a database of amino acid repeats present in lower eukaryotic pathogens

Daniel P Depledge et al. BMC Bioinformatics. .

Abstract

Background: Amino acid repeat-containing proteins have a broad range of functions and their identification is of relevance to many experimental biologists. In human-infective protozoan parasites (such as the Kinetoplastid and Plasmodium species), they are implicated in immune evasion and have been shown to influence virulence and pathogenicity. RepSeq http://repseq.gugbe.com is a new database of amino acid repeat-containing proteins found in lower eukaryotic pathogens. The RepSeq database is accessed via a web-based application which also provides links to related online tools and databases for further analyses.

Results: The RepSeq algorithm typically identifies more than 98% of repeat-containing proteins and is capable of identifying both perfect and mismatch repeats. The proportion of proteins that contain repeat elements varies greatly between different families and even species (3-35% of the total protein content). The most common motif type is the Sequence Repeat Region (SRR)--a repeated motif containing multiple different amino acid types. Proteins containing Single Amino Acid Repeats (SAARs) and Di-Peptide Repeats (DPRs) typically account for 0.5-1.0% of the total protein number. Notable exceptions are P. falciparum and D. discoideum, in which 33.67% and 34.28% respectively of the predicted proteomes consist of repeat-containing proteins. These numbers are due to large insertions of low complexity single and multi-codon repeat regions.

Conclusion: The RepSeq database provides a repository for repeat-containing proteins found in parasitic protozoa. The database allows for both individual and cross-species proteome analyses and also allows users to upload sequences of interest for analysis by the RepSeq algorithm. Identification of repeat-containing proteins provides researchers with a defined subset of proteins which can be analysed by expression profiling and functional characterisation, thereby facilitating study of pathogenicity and virulence factors in the parasitic protozoa. While primarily designed for kinetoplastid work, the RepSeq algorithm and database retain full functionality when used to analyse other species.

PubMed Disclaimer

Figures

Figure 1
Figure 1
RepSeq database UML design. The database schema consists of three tables in which data redundancy is eliminated by data linking from child tables via foreign keys.
Figure 2
Figure 2
RepSeq query interface. The query interface contains a number of options that can be adjusted to limit/expand the search. The user is also able to search for specific genes or annotations.
Figure 3
Figure 3
RepSeq output table. The top table shows the initial output of the input queries. Selecting a gene then displays the second image, indicating where each repeat is located (red) and allowing the user to determine its motif.

References

    1. Depledge DP, Dalby AR. COPASAAR- A database for proteomic analysis of single amino acid repeats. BMC Bioinformatics. 2005;6:196. doi: 10.1186/1471-2105-6-196. - DOI - PMC - PubMed
    1. Kruglyak S, Durrett R, Schug MD, Aquadro CF. Distribution and abundance of microsatellites in the yeast genome can be explained by a balance between slippage events and point mutations. Mol Biol Evol. 2000;17:1210–1219. - PubMed
    1. LeProust EM, Pearso CE, Sinden RR, Gao XL. Unexpected formation of parallel duplex in GAA and TTC trinucleotide repeats of Friedreich's ataxia. J Mol Biol. 2000;302:1063–1080. doi: 10.1006/jmbi.2000.4073. - DOI - PubMed
    1. Kashi Y, King D, Soller M. Simple sequence repeats as a source of quantitative genetic variation. Trends Genet. 1997;13:74–78. doi: 10.1016/S0168-9525(97)01008-1. - DOI - PubMed
    1. Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D. A census of protein repeats. J Mol Biol. 1999;293:151–160. doi: 10.1006/jmbi.1999.3136. - DOI - PubMed

Publication types