A Fourier characteristic of coding sequences: origins and a non-Fourier approximation
- PMID: 16305326
- DOI: 10.1089/cmb.2005.12.1153
A Fourier characteristic of coding sequences: origins and a non-Fourier approximation
Abstract
The 3-base periodicity, identified as a pronounced peak at the frequency N/3 (N is the length of the DNA sequence) of the Fourier power spectrum of protein coding regions, is used as a marker in gene-finding algorithms to distinguish protein coding regions (exons) and noncoding regions (introns) of genomes. In this paper, we reveal the explanation of this phenomenon which results from a nonuniform distribution of nucleotides in the three coding positions. There is a linear correlation between the nucleotide distributions in the three codon positions and the power spectrum at the frequency N/3. Furthermore, this study indicates the relationship between the length of a DNA sequence and the variance of nucleotide distributions and the average Fourier power spectrum, which is the noise signal in gene-finding methods. The results presented in this paper provide an efficient way to compute the Fourier power spectrum at N/3 and the noise signal in gene-finding methods by calculating the nucleotide distributions in the three codon positions.
Similar articles
-
Theoretical justification of computing the 3-base periodicity using nucleotide distribution variance.Biosystems. 2010 Sep;101(3):185-6. doi: 10.1016/j.biosystems.2010.07.001. Epub 2010 Jul 13. Biosystems. 2010. PMID: 20633601
-
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.Mol Cell Probes. 2014 Oct-Dec;28(5-6):228-36. doi: 10.1016/j.mcp.2014.04.002. Epub 2014 Apr 29. Mol Cell Probes. 2014. PMID: 24787059
-
Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.J Bioinform Comput Biol. 2015 Apr;13(2):1550004. doi: 10.1142/S0219720015500043. Epub 2014 Dec 10. J Bioinform Comput Biol. 2015. PMID: 25491390
-
Phase-dependent nucleotide substitution in protein-coding sequences.Biochem Biophys Res Commun. 2007 Apr 13;355(3):599-602. doi: 10.1016/j.bbrc.2007.01.006. Epub 2007 Jan 10. Biochem Biophys Res Commun. 2007. PMID: 17300744 Review.
-
Assessment of protein coding measures.Nucleic Acids Res. 1992 Dec 25;20(24):6441-50. doi: 10.1093/nar/20.24.6441. Nucleic Acids Res. 1992. PMID: 1480466 Free PMC article. Review.
Cited by
-
Effective gene prediction by high resolution frequency estimator based on least-norm solution technique.EURASIP J Bioinform Syst Biol. 2014 Jan 4;2014(1):2. doi: 10.1186/1687-4153-2014-2. EURASIP J Bioinform Syst Biol. 2014. PMID: 24386895 Free PMC article.
-
Identification of a circular code periodicity in the bacterial ribosome: origin of codon periodicity in genes?RNA Biol. 2020 Apr;17(4):571-583. doi: 10.1080/15476286.2020.1719311. Epub 2020 Feb 11. RNA Biol. 2020. PMID: 31960748 Free PMC article.
-
LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature.PLoS One. 2016 May 26;11(5):e0154567. doi: 10.1371/journal.pone.0154567. eCollection 2016. PLoS One. 2016. PMID: 27228152 Free PMC article.
-
Genomic signal processing methods for computation of alignment-free distances from DNA sequences.PLoS One. 2014 Nov 13;9(11):e110954. doi: 10.1371/journal.pone.0110954. eCollection 2014. PLoS One. 2014. PMID: 25393409 Free PMC article.
-
A new method to cluster DNA sequences using Fourier power spectrum.J Theor Biol. 2015 May 7;372:135-45. doi: 10.1016/j.jtbi.2015.02.026. Epub 2015 Mar 5. J Theor Biol. 2015. PMID: 25747773 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources