Dinucleotide frequencies in different reading frame positions of coding mammalian DNA sequences
- PMID: 3463303
Dinucleotide frequencies in different reading frame positions of coding mammalian DNA sequences
Abstract
A statistical model for the assessment of suppressions or preferences of 16 dinucleotides in DNA sequences was developed. It is based on the description by a hypergeometric distribution of the doublet frequencies in randomly "scrambled" DNA sequences. The statistical test is sequential and extracts one after another dinucleotides that differ significantly from their expected values. It is shown that in mammalian DNA only TA and CG are consistently depressed in all three reading frame positions. The deviations of other dinucleotides are either restricted to one frame position or not significant. The possibility that the coding commitments of the DNA sequences may be the causes of the non-random distribution was studied. Only in position 1/2 of the reading frame is the frequency behavior of TA adequately explained by the amino acid sequence coded for. It is concluded that TA and CG are avoided wherever possible for reasons that do not reside in the coding function of mammalian DNA sequences.