Scaling features of noncoding DNA
- PMID: 11542924
- DOI: 10.1016/s0378-4371(99)00407-0
Scaling features of noncoding DNA
Abstract
We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.
Similar articles
-
Statistical and linguistic features of DNA sequences.Fractals. 1995 Jun;3(2):269-84. doi: 10.1142/s0218348x95000229. Fractals. 1995. PMID: 11539281
-
Statistical properties of DNA sequences.Physica A. 1995;221:180-92. doi: 10.1016/0378-4371(95)00247-5. Physica A. 1995. PMID: 11540495
-
Linguistic features of noncoding DNA sequences.Phys Rev Lett. 1994 Dec 5;73(23):3169-72. doi: 10.1103/PhysRevLett.73.3169. Phys Rev Lett. 1994. PMID: 10057305
-
Scaling in nature: from DNA through heartbeats to weather.Physica A. 1999 Nov 1;273(1-2):46-69. doi: 10.1016/s0378-4371(99)00340-4. Physica A. 1999. PMID: 11543356 Review.
-
Fractals in biology and medicine.Chaos Solitons Fractals. 1995;6:171-201. doi: 10.1016/0960-0779(95)80025-c. Chaos Solitons Fractals. 1995. PMID: 11539852 Review.
Cited by
-
Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences.J Biomed Biotechnol. 2005 Jun 30;2005(2):139-46. doi: 10.1155/JBB.2005.139. J Biomed Biotechnol. 2005. PMID: 16046819 Free PMC article.
-
Understanding Zipf's law of word frequencies through sample-space collapse in sentence formation.J R Soc Interface. 2015 Jul 6;12(108):20150330. doi: 10.1098/rsif.2015.0330. J R Soc Interface. 2015. PMID: 26063827 Free PMC article.
-
General statistics of stochastic process of gene expression in eukaryotic cells.Genetics. 2002 Jul;161(3):1321-32. doi: 10.1093/genetics/161.3.1321. Genetics. 2002. PMID: 12136033 Free PMC article.
-
From 'omics' to complex disease: a systems biology approach to gene-environment interactions in cancer.Cancer Cell Int. 2010 Apr 26;10:11. doi: 10.1186/1475-2867-10-11. Cancer Cell Int. 2010. PMID: 20420667 Free PMC article.
-
Sequence based prediction of enhancer regions from DNA random walk.Sci Rep. 2018 Oct 29;8(1):15912. doi: 10.1038/s41598-018-33413-y. Sci Rep. 2018. PMID: 30374023 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Molecular Biology Databases