Dinucleotide frequencies in different reading frame positions of coding mammalian DNA sequences
- PMID: 3463303
Dinucleotide frequencies in different reading frame positions of coding mammalian DNA sequences
Abstract
A statistical model for the assessment of suppressions or preferences of 16 dinucleotides in DNA sequences was developed. It is based on the description by a hypergeometric distribution of the doublet frequencies in randomly "scrambled" DNA sequences. The statistical test is sequential and extracts one after another dinucleotides that differ significantly from their expected values. It is shown that in mammalian DNA only TA and CG are consistently depressed in all three reading frame positions. The deviations of other dinucleotides are either restricted to one frame position or not significant. The possibility that the coding commitments of the DNA sequences may be the causes of the non-random distribution was studied. Only in position 1/2 of the reading frame is the frequency behavior of TA adequately explained by the amino acid sequence coded for. It is concluded that TA and CG are avoided wherever possible for reasons that do not reside in the coding function of mammalian DNA sequences.
Similar articles
-
Dinucleotide frequencies in different reading frame positions of coding bacterial DNA sequences.Biomed Biochim Acta. 1986;45(9):1105-9. Biomed Biochim Acta. 1986. PMID: 3468945
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Control of methylation spreading in synthetic DNA sequences by the murine DNA methyltransferase.J Mol Biol. 1997 Jun 20;269(4):494-504. doi: 10.1006/jmbi.1997.1064. J Mol Biol. 1997. PMID: 9217255
-
Distinctive sequence features in protein coding genic non-coding, and intergenic human DNA.J Mol Biol. 1995 Oct 13;253(1):51-60. doi: 10.1006/jmbi.1995.0535. J Mol Biol. 1995. PMID: 7473716
-
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156. BMC Bioinformatics. 2005. PMID: 15969769 Free PMC article.
Cited by
-
Statistical analysis of nucleotide sequences.Nucleic Acids Res. 1990 Nov 25;18(22):6641-7. doi: 10.1093/nar/18.22.6641. Nucleic Acids Res. 1990. PMID: 2251125 Free PMC article.
-
BIGPROBE: a computer program that predicts the sequence of long oligonucleotide probes with high reliability.Nucleic Acids Res. 1988 Mar 11;16(5):1703-14. doi: 10.1093/nar/16.5.1703. Nucleic Acids Res. 1988. PMID: 3353219 Free PMC article.
-
Intercodon dinucleotides affect codon choice in plant genes.Nucleic Acids Res. 2000 Sep 1;28(17):3339-45. doi: 10.1093/nar/28.17.3339. Nucleic Acids Res. 2000. PMID: 10954603 Free PMC article.