Over- and under-representation of short oligonucleotides in DNA sequences
- PMID: 1741388
- PMCID: PMC48449
- DOI: 10.1073/pnas.89.4.1358
Over- and under-representation of short oligonucleotides in DNA sequences
Abstract
Strand-symmetric relative abundance functionals for di-, tri-, and tetranucleotides are introduced and applied to sequences encompassing a broad phylogenetic range to discern tendencies and anomalies in the occurrences of these short oligonucleotides within and between genomic sequences. For dinucleotides, TA is almost universally under-represented, with the exception of vertebrate mitochondrial genomes, and CG is strongly under-represented in vertebrates and in mitochondrial genomes. The traditional methylation/deamination/mutation hypothesis for the rarity of CG does not adequately account for the observed deficiencies in certain sequences, notably the mitochondrial genomes, yeast, and Neurospora crassa, which lack the standard CpG methylase. Homodinucleotides (AA.TT, CC.GG) and larger homooligonucleotides are over-represented in many organisms, perhaps due to polymerase slippage events. For trinucleotides, GCA.TGC tends to be under-represented in phage, human viral, and eukaryotic sequences, and CTA.TAG is strongly under-represented in many prokaryotic, eukaryotic, and viral sequences. The CCA.TGG triplet is ubiquitously over-represented in human viral and eukaryotic sequences. Among the tetranucleotides, several four-base-pair palindromes tend to be under-represented in phage sequences, probably as a means of restriction avoidance. The tetranucleotide CTAG is observed to be rare in virtually all bacterial genomes and some phage genomes. Explanations for these over- and under-representations in terms of DNA/RNA structures and regulatory mechanisms are considered.
Similar articles
-
Heterogeneity of genomes: measures and values.Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12837-41. doi: 10.1073/pnas.91.26.12837. Proc Natl Acad Sci U S A. 1994. PMID: 7809131 Free PMC article.
-
Skew of mononucleotide frequencies, relative abundance of dinucleotides, and DNA strand asymmetry.J Mol Evol. 2001 Oct-Nov;53(4-5):364-76. doi: 10.1007/s002390010226. J Mol Evol. 2001. PMID: 11675596
-
Pervasive CpG suppression in animal mitochondrial genomes.Proc Natl Acad Sci U S A. 1994 Apr 26;91(9):3799-803. doi: 10.1073/pnas.91.9.3799. Proc Natl Acad Sci U S A. 1994. PMID: 8170990 Free PMC article.
-
Comparative DNA analysis across diverse genomes.Annu Rev Genet. 1998;32:185-225. doi: 10.1146/annurev.genet.32.1.185. Annu Rev Genet. 1998. PMID: 9928479 Review.
-
Computational DNA sequence analysis.Annu Rev Microbiol. 1994;48:619-54. doi: 10.1146/annurev.mi.48.100194.003155. Annu Rev Microbiol. 1994. PMID: 7826021 Review.
Cited by
-
NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence.Nucleic Acids Res. 2005 Mar 10;33(5):1445-53. doi: 10.1093/nar/gki282. Print 2005. Nucleic Acids Res. 2005. PMID: 15760844 Free PMC article.
-
SeqDeχ: A Sequence Deconvolution Tool for Genome Separation of Endosymbionts From Mixed Sequencing Samples.Front Genet. 2019 Sep 19;10:853. doi: 10.3389/fgene.2019.00853. eCollection 2019. Front Genet. 2019. PMID: 31608107 Free PMC article.
-
The correlation between recombination rate and dinucleotide bias in Drosophila melanogaster.J Mol Evol. 2008 Oct;67(4):358-67. doi: 10.1007/s00239-008-9150-0. Epub 2008 Sep 17. J Mol Evol. 2008. PMID: 18797953
-
Evolutionary selection against short nucleotide sequences in viruses and their related hosts.DNA Res. 2020 Apr 1;27(2):dsaa008. doi: 10.1093/dnares/dsaa008. DNA Res. 2020. PMID: 32339222 Free PMC article.
-
A Markovian analysis of bacterial genome sequence constraints.PeerJ. 2013 Aug 29;1:e127. doi: 10.7717/peerj.127. eCollection 2013. PeerJ. 2013. PMID: 24010012 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources