Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1992 Feb 15;89(4):1358-62.
doi: 10.1073/pnas.89.4.1358.

Over- and under-representation of short oligonucleotides in DNA sequences

Affiliations

Over- and under-representation of short oligonucleotides in DNA sequences

C Burge et al. Proc Natl Acad Sci U S A. .

Abstract

Strand-symmetric relative abundance functionals for di-, tri-, and tetranucleotides are introduced and applied to sequences encompassing a broad phylogenetic range to discern tendencies and anomalies in the occurrences of these short oligonucleotides within and between genomic sequences. For dinucleotides, TA is almost universally under-represented, with the exception of vertebrate mitochondrial genomes, and CG is strongly under-represented in vertebrates and in mitochondrial genomes. The traditional methylation/deamination/mutation hypothesis for the rarity of CG does not adequately account for the observed deficiencies in certain sequences, notably the mitochondrial genomes, yeast, and Neurospora crassa, which lack the standard CpG methylase. Homodinucleotides (AA.TT, CC.GG) and larger homooligonucleotides are over-represented in many organisms, perhaps due to polymerase slippage events. For trinucleotides, GCA.TGC tends to be under-represented in phage, human viral, and eukaryotic sequences, and CTA.TAG is strongly under-represented in many prokaryotic, eukaryotic, and viral sequences. The CCA.TGG triplet is ubiquitously over-represented in human viral and eukaryotic sequences. Among the tetranucleotides, several four-base-pair palindromes tend to be under-represented in phage sequences, probably as a means of restriction avoidance. The tetranucleotide CTAG is observed to be rare in virtually all bacterial genomes and some phage genomes. Explanations for these over- and under-representations in terms of DNA/RNA structures and regulatory mechanisms are considered.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Annu Rev Genet. 1990;24:579-613 - PubMed
    1. Comput Appl Biosci. 1991 Jan;7(1):39-49 - PubMed
    1. Nature. 1988 Sep 22;335(6188):321-9 - PubMed
    1. Mol Gen Genet. 1985;199(3):465-70 - PubMed
    1. J Mol Evol. 1988 Dec-1989 Feb;28(1-2):7-18 - PubMed

Publication types

LinkOut - more resources