Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence
- PMID: 17509616
- DOI: 10.1016/j.jtbi.2007.03.038
Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence
Abstract
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction.
Similar articles
-
A three-state model for DNA protein-coding regions.IEEE Trans Biomed Eng. 2006 Nov;53(11):2148-55. doi: 10.1109/TBME.2006.879477. IEEE Trans Biomed Eng. 2006. PMID: 17073319
-
[Study of numerical mapping methods for DNA sequences].Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2005 Aug;22(4):681-5. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2005. PMID: 16156249 Chinese.
-
Identification of protein coding regions in genomic DNA.J Mol Biol. 1995 Apr 21;248(1):1-18. doi: 10.1006/jmbi.1995.0198. J Mol Biol. 1995. PMID: 7731036
-
The computational detection of functional nucleotide sequence motifs in the coding regions of organisms.Exp Biol Med (Maywood). 2008 Jun;233(6):665-73. doi: 10.3181/0704-MR-97. Epub 2008 Apr 11. Exp Biol Med (Maywood). 2008. PMID: 18408149 Review.
-
Steady progress and recent breakthroughs in the accuracy of automated genome annotation.Nat Rev Genet. 2008 Jan;9(1):62-73. doi: 10.1038/nrg2220. Nat Rev Genet. 2008. PMID: 18087260 Review.
Cited by
-
Short Exon Detection via Wavelet Transform Modulus Maxima.PLoS One. 2016 Sep 16;11(9):e0163088. doi: 10.1371/journal.pone.0163088. eCollection 2016. PLoS One. 2016. PMID: 27635656 Free PMC article.
-
Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.DNA Res. 2019 Apr 1;26(2):157-170. doi: 10.1093/dnares/dsy046. DNA Res. 2019. PMID: 30726896 Free PMC article.
-
Sequence Maneuverer: tool for sequence extraction from genomes.Bioinformation. 2012;8(25):1277-9. doi: 10.6026/97320630081277. Epub 2012 Dec 19. Bioinformation. 2012. PMID: 23275734 Free PMC article.
-
Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.Comput Struct Biotechnol J. 2019 Jul 11;17:982-994. doi: 10.1016/j.csbj.2019.07.003. eCollection 2019. Comput Struct Biotechnol J. 2019. PMID: 31384399 Free PMC article.
-
A coevolution analysis for identifying protein-protein interactions by Fourier transform.PLoS One. 2017 Apr 21;12(4):e0174862. doi: 10.1371/journal.pone.0174862. eCollection 2017. PLoS One. 2017. PMID: 28430779 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources