Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005;6(8):R69.
doi: 10.1186/gb-2005-6-8-r69. Epub 2005 Jul 28.

Tandem repeat copy-number variation in protein-coding regions of human genes

Affiliations

Tandem repeat copy-number variation in protein-coding regions of human genes

Colm T O'Dushlaine et al. Genome Biol. 2005.

Abstract

Background: Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles.

Results: Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms.

Conclusion: Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Frequency of variant and invariant repeats. (a) Histogram of the frequencies of different length repeat units in the dataset. Repeats that are multiples of three occur at greater frequency across both variant and non-variant repeats. Mononucleotide repeats were not included in the analysis. Variants represent differences between the representative and the alleles that are a multiple of the unit length and consistent with a change in repeat copy-number. N, number of identified length variants (295 variants observed in 249 tandem repeats in 218 genes). For the non-variant repeats, N represents the number of unique invariant repeats. The x-axis is on a logarithmic scale. (b) Breakdown of repeat variants by the type of variant. Unit lengths 2 to 20 are shown here, encompassing 288 of the 295 variants. Areas in black above bars 2 and 4 represent variants of units this length that are also a multiple of three.
Figure 2
Figure 2
Weighted scatter-plot of the pattern of detected tandem repeat length variation. Length of repeat unit is plotted against the absolute difference between query and hit repeat block lengths. One variant corresponding to a length difference of 144 for a 48-nucleotide repeat has been omitted. Note that the length of repeat unit, rather than the tandem repeat array length, is plotted on the x-axis and most observed length differences are multiples of the corresponding unit length. The area of each circle is proportional to number of variants observed with a given unit length, and a given nucleotide difference between the representative and variant sequences.
Figure 3
Figure 3
Distribution of copy-numbers of tandem repeats. The x-axis indicates the number of tandem repeat loci of a given unit length (indicated by color key) and with a given copy-number (indicated on the x-axis, rounded to the nearest whole number). (a) Non-variants, N = 88,850; (b) variants, N = 249; copy-number for variants represents the average copy-number among variants.

Similar articles

Cited by

References

    1. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. The Huntington's Disease Collaborative Research Group. Cell. 1993;72:971–983. doi: 10.1016/0092-8674(93)90585-E. - DOI - PubMed
    1. Verkerk AJ, Pieretti M, Sutcliffe JS, Fu YH, Kuhl DP, Pizzuti A, Reiner O, Richards S, Victoria MF, Zhang FP, et al. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell. 1991;65:905–914. doi: 10.1016/0092-8674(91)90397-H. - DOI - PubMed
    1. Hui J, Stangl K, Lane WS, Bindereif A. HnRNP L stimulates splicing of the eNOS gene by binding to variable-length CA repeats. Nat Struct Biol. 2003;10:33–37. doi: 10.1038/nsb875. - DOI - PubMed
    1. Gebhardt F, Zanker KS, Brandt B. Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1. J Biol Chem. 1999;274:13176–13180. doi: 10.1074/jbc.274.19.13176. - DOI - PubMed
    1. Jeffreys AJ, Royle NJ, Wilson V, Wong Z. Spontaneous mutation rates to new length alleles at tandem-repetitive hypervariable loci in human DNA. Nature. 1988;332:278–281. doi: 10.1038/332278a0. - DOI - PubMed

Publication types

LinkOut - more resources