Representation of DNA sequences in genetic codon context with applications in exon and intron prediction
- PMID: 25491390
- DOI: 10.1142/S0219720015500043
Representation of DNA sequences in genetic codon context with applications in exon and intron prediction
Abstract
To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
Keywords: Fourier transform; Gene; exon; genetic codon; intron.
Similar articles
-
Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence.J Theor Biol. 2007 Aug 21;247(4):687-94. doi: 10.1016/j.jtbi.2007.03.038. Epub 2007 Apr 10. J Theor Biol. 2007. PMID: 17509616
-
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.Mol Cell Probes. 2014 Oct-Dec;28(5-6):228-36. doi: 10.1016/j.mcp.2014.04.002. Epub 2014 Apr 29. Mol Cell Probes. 2014. PMID: 24787059
-
Exon prediction based on multiscale products of a genomic-inspired multiscale bilateral filtering.PLoS One. 2019 Mar 21;14(3):e0205050. doi: 10.1371/journal.pone.0205050. eCollection 2019. PLoS One. 2019. PMID: 30897105 Free PMC article.
-
Advances in the Exon-Intron Database (EID).Brief Bioinform. 2006 Jun;7(2):178-85. doi: 10.1093/bib/bbl003. Epub 2006 Mar 9. Brief Bioinform. 2006. PMID: 16772261 Review.
-
DNA numerical encoding schemes for exon prediction: a recent history.Nucleosides Nucleotides Nucleic Acids. 2021;40(10):985-1017. doi: 10.1080/15257770.2021.1966797. Epub 2021 Aug 28. Nucleosides Nucleotides Nucleic Acids. 2021. PMID: 34455915 Review.
Cited by
-
One novel representation of DNA sequence based on the global and local position information.Sci Rep. 2018 May 15;8(1):7592. doi: 10.1038/s41598-018-26005-3. Sci Rep. 2018. PMID: 29765099 Free PMC article.
-
Periodic power spectrum with applications in detection of latent periodicities in DNA sequences.J Math Biol. 2016 Nov;73(5):1053-1079. doi: 10.1007/s00285-016-0982-8. Epub 2016 Mar 4. J Math Biol. 2016. PMID: 26942584
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous