Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000 Aug;67(2):345-56.
doi: 10.1086/303013. Epub 2000 Jul 7.

Repeat polymorphisms within gene regions: phenotypic and evolutionary implications

Affiliations

Repeat polymorphisms within gene regions: phenotypic and evolutionary implications

J D Wren et al. Am J Hum Genet. 2000 Aug.

Abstract

We have developed an algorithm that predicted 11,265 potentially polymorphic tandem repeats within transcribed sequences. We estimate that 22% (2,207/9,717) of the annotated clusters within UniGene contain at least one potentially polymorphic locus. Our predictions were tested by allelotyping a panel of approximately 30 individuals for 5% of these regions, confirming polymorphism for more than half the loci tested. Our study indicates that tandem-repeat polymorphisms in genes are more common than is generally believed. Approximately 8% of these loci are within coding sequences and, if polymorphic, would result in frameshifts. Our catalogue of putative polymorphic repeats within transcribed sequences comprises a large set of potentially phenotypic or disease-causing loci. In addition, from the anomalous character of the repetitive sequences within unannotated clusters, we also conclude that the UniGene cluster count substantially overestimates the number of genes in the human genome. We hypothesize that polymorphisms in repeated sequences occur with some baseline distribution, on the basis of repeat homogeneity, size, and sequence composition, and that deviations from that distribution are indicative of the nature of selection pressure at that locus. We find evidence of selective maintenance of the ability of some genes to respond very rapidly, perhaps even on intragenerational timescales, to fluctuating selective pressures.

PubMed Disclaimer

Figures

Figure  1
Figure 1
Two examples of REP-X prediction of polymorphisms. Top, Polymorphism in HVEC, encoding eight to nine polyglutamic acids residues located in the cytoplasmic portion of this transmembrane protein. Polyglutamic acid tracts have been associated with microtubule binding and factors promoting DNA conformational changes. The effects of copy-number variance are not known but could play a role in herpesvirus infectivity. Bottom, Frameshifting dinucleotide polymorphism located at the C-terminal end of ACRP. The alleles shown here represent all three coding frames resulting from the polymorphism.
Figure  2
Figure 2
Amino acid repeats from transcribed UniGene entries, varying in both number and potential for polymorphism. Tandem repeats of at least five amino acids from annotated UniGene entries are shown, grouped first by the number of corresponding codons available to encode them, in descending order from left to right and then, within this group by the number of repeats. Some amino acid repeats are severely underrepresented in humans, whereas others are not. Some amino acid repeats (Q and H) tend to be encoded by a higher percentage of potentially polymorphic codon repeats (blackened portion of bars) than are those (R and P) that use a more heterogeneous codon set to encode the repeat (gray portion of bars). Amino acids encoded by more codons have a greater tendency to exhibit repeat heterogeneity, but there are significant departures from this trend (e.g., L > P and G > K).
Figure  3
Figure 3
Selection for or against allelic plasticity, reflected in repeat homogeneity. A, Homogeneity distributions for DNA encoding four repeated amino acids, including alanine. These are almost identical. Because, regardless of their homogeneity, repeats of four trimers are rarely polymorphic, differences between coding DNA and genomic sequences are not expected. B, Homogeneity distributions for perfectly homogeneous repeats of five trimers. Although they are expected to exhibit some elevated plasticity, the effects of selection to repress this plasticity can be observed in the reduced proportion of alanine-coding pentamers that have perfect homogeneity, relative to genomic sequences (shaded). C, Homogeneity distributions for longer repeats. This trend continues and becomes more pronounced for longer repeats, wherein highly pure homopolymer-encoding repeats are underrepresented, presumably because of selection for synonymous substitutions that repress repeat expansions and contractions. D, Homogeneity distributions for other types of repeats, such as leucine hexamers. The distribution is shifted toward higher homogeneity in coding sequences relative to genomic sequences, suggesting that selection is functioning to increase allelic plasticity for a substantial proportion of these loci.

References

Electronic-Database Information

    1. Garner Lab at UTSW, The, http://pompous.swmed.edu
    1. GenBank Overview, http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html (for human databases [accession numbers Y00285, D86407, M60052, AF047437, D83492, Y11525, AF032886, M60315, X82209, U60325, U49020, AF017789, M60052, AF060231, AF013956, D86550, AF042838, T62484, T63962, R42196, X78261, T70173, R12160, T47177, X55313, L08835, M64347, D14838, M55047, X70811, U36798, U36336, K02402, X04412, U75285, X78520, U68723, AF065482, M36089, L04489, AB015132, X15949, AF022654, U38276, S83513, U29589, X06374, AB002454, D16532, U92436, U38810, AL021155, X60188, U43292, M75866, M73980, U94333, U21858, D55655, U34962, U47741, U02031, U23752, AF002715, AF010403, S62539, AB005216, AB011792, X05299, M55514, L06147, X05299, AF053944, U68063, Y00764, X53416, AF008192, U13616, AF051946, U88153, T87413, R33865, T62835, T80553, T70304, T60175, L14837, Y00285, U17327, NM_004691, X02812, M14764, AB010710, M12783, U67784, U52152, Y00815, M74525, AF075292, U58334, X02812, L08488, and X17360])
    1. Entrez Nucleotide, http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Nucleotide (for nucleotide sequences)
    1. UniGene Database (latest release), ftp://ncbi.nlm.nih.gov/repository/UniGene/Hs.seq.uniq.Z
    1. UniGene Build #113, ftp://ncbi.nlm.nih.gov/repository/UniGene/Hs.info

References

    1. Alba MM, Santibanez-Koref MF, Hancock JM (1999) Conservation of polyglutamine tract size between mice and humans depends on codon interruption. Mol Biol Evol 16:1641–1644 - PubMed
    1. Beaton S, ten Have J, Cleary A, Bradley MP (1995) Cloning and partial characterization of the cDNA encoding the fox sperm protein FSA-Acr.1 with similarities to the SP-10 antigen. Mol Reprod Dev 40:242–252 - PubMed
    1. Bidichandani SI, Ashizawa T, Patel PI (1998) The GAA triplet-repeat expansion in Friedreich ataxia interferes with transcription and may be associated with an unusual DNA structure. Am J Hum Genet 62:111–121 - PMC - PubMed
    1. Boguski MS, Schuler GD (1995) ESTablishing a human transcript map. Nat Genet 10:369–371 - PubMed
    1. Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94 - PubMed

Publication types

Associated data