Comparative Study

. 2005 Aug 3:6:196.

doi: 10.1186/1471-2105-6-196.

COPASAAR--a database for proteomic analysis of single amino acid repeats

Daniel P Depledge¹, Andrew R Dalby

Affiliations

Affiliation

¹ School of Biological and Chemical Sciences and Engineering, Washington Singer Laboratories, University of Exeter, Prince of Wales Road, Exeter, EX4 4PS, UK. dan_depledge@hotmail.com

PMID: 16078990
PMCID: PMC1199582
DOI: 10.1186/1471-2105-6-196

Comparative Study

COPASAAR--a database for proteomic analysis of single amino acid repeats

Daniel P Depledge et al. BMC Bioinformatics. 2005.

. 2005 Aug 3:6:196.

doi: 10.1186/1471-2105-6-196.

Authors

Daniel P Depledge¹, Andrew R Dalby

Affiliation

¹ School of Biological and Chemical Sciences and Engineering, Washington Singer Laboratories, University of Exeter, Prince of Wales Road, Exeter, EX4 4PS, UK. dan_depledge@hotmail.com

PMID: 16078990
PMCID: PMC1199582
DOI: 10.1186/1471-2105-6-196

Abstract

Background: Single amino acid repeats make up a significant proportion in all of the proteomes that have currently been determined. They have been shown to be functionally and medically significant, and are associated with cancers and neuro-degenerative diseases such as Huntington's Chorea, where a poly-glutamine repeat is responsible for causing the disease. The COPASAAR database is a new tool to facilitate the rapid analysis of single amino acid repeats at a proteome level. The database aims to simplify the comparison of repeat distributions between proteomes in order to provide a better understanding of their function and evolution.

Results: A comparative analysis of all proteomes in the database (currently 244) shows that single amino acid repeats account for about 12-14% of the proteome of any given species. They are more common in eukaryotes (14%) than in either archaea or bacteria (both 13%). Individual analyses of proteomes show that long single amino acid repeats (6+ residues) are much more common in the Eukaryotes and that longer repeats are usually made up of hydrophilic amino acids such as glutamine, glutamic acid, asparagine, aspartic acid and serine.

Conclusion: COPASAAR is a useful tool for comparative proteomics that provides rapid access to amino acid repeat data that can be readily data-mined. The COPASAAR database can be queried at the kingdom, proteome or individual protein level. As the amount of available proteome data increases this will be increasingly important in order to automate proteome comparison. The insights gained from these studies will give a better insight into the evolution of protein sequence and function.

PubMed Disclaimer

Figures

**Figure 1**
Database schema for COPASAAR. Note that each of the species_repeats, species_expected, protein_repeats and protein_expected tables will be repeated 20 times once for each amino acid.

**Figure 2**
Example SQL script used to query the database for all proteins in humans with an alanine repeat of 6 amino acids.

See this image and copyright information in PMC

References

1. Pearson CE, Sinden RR. Trinucleotide repeat DNA structures: dynamic mutations from dynamic DNA. Curr Opin Struct Biol. 1998;8:321–330. doi: 10.1016/S0959-440X(98)80065-1. - DOI - PubMed
1. Kruglyak S, Durrett R, Schug MD, Aquadro CF. Distribution and abundance of microsatellites in the yeast genome can be explained by a balance between slippage events and point mutations. Mol Biol Evol. 2000;17:1210–1219. - PubMed
1. LeProust EM, Pearso CE, Sinden RR, Gao XL. Unexpected formation of parallel duplex in GAA and TTC trinucleotide repeats of Friedreich's ataxia. J Mol Biol. 2000;302:1063–1080. doi: 10.1006/jmbi.2000.4073. - DOI - PubMed
1. Kashi Y, King D, Soller M. Simple sequence repeats as a source of quantitative genetic variation. Trends Genet. 1997;13:74–78. doi: 10.1016/S0168-9525(97)01008-1. - DOI - PubMed
1. Alba MM, Santibanez-Koref MF, Hancock JM. Conservation of polyglutamine tract size between mice and humans depends on codon interruption. Mol Biol Evol. 1999;16:1641–1644. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

COPASAAR--a database for proteomic analysis of single amino acid repeats

Affiliation

COPASAAR--a database for proteomic analysis of single amino acid repeats

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials