Universal Features for the Classification of Coding and Non-coding DNA Sequences
- PMID: 20140069
- PMCID: PMC2808180
- DOI: 10.4137/bbi.s2236
Universal Features for the Classification of Coding and Non-coding DNA Sequences
Abstract
In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding.
Keywords: ancestral codon; coding features; exon prediction; genomics; open reading frame; purine bias.
Figures








Similar articles
-
A Statistical Method without Training Step for the Classification of Coding Frame in Transcriptome Sequences.Bioinform Biol Insights. 2013;7:35-54. doi: 10.4137/BBI.S10053. Epub 2013 Jan 23. Bioinform Biol Insights. 2013. PMID: 23400232 Free PMC article.
-
Classifying coding DNA with nucleotide statistics.Bioinform Biol Insights. 2009 Oct 28;3:141-54. doi: 10.4137/bbi.s3030. Bioinform Biol Insights. 2009. PMID: 20140062 Free PMC article.
-
The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins.Bioinform Biol Insights. 2014 May 20;8:93-108. doi: 10.4137/BBI.S13161. eCollection 2014. Bioinform Biol Insights. 2014. PMID: 24899802 Free PMC article.
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Constraint on di-nucleotides by codon usage bias in bacterial genomes.Gene. 2014 Feb 15;536(1):18-28. doi: 10.1016/j.gene.2013.11.098. Epub 2013 Dec 11. Gene. 2014. PMID: 24333347
Cited by
-
A Statistical Method without Training Step for the Classification of Coding Frame in Transcriptome Sequences.Bioinform Biol Insights. 2013;7:35-54. doi: 10.4137/BBI.S10053. Epub 2013 Jan 23. Bioinform Biol Insights. 2013. PMID: 23400232 Free PMC article.
-
Human Retrovirus Codon Usage from tRNA Point of View: Therapeutic Insights.Bioinform Biol Insights. 2013 Oct 10;7:335-45. doi: 10.4137/BBI.S12093. eCollection 2013. Bioinform Biol Insights. 2013. PMID: 24151425 Free PMC article.
-
An Interpretation of the Ancestral Codon from Miller's Amino Acids and Nucleotide Correlations in Modern Coding Sequences.Bioinform Biol Insights. 2015 Apr 15;9:37-47. doi: 10.4137/BBI.S24021. eCollection 2015. Bioinform Biol Insights. 2015. PMID: 25922573 Free PMC article.
-
Classifying coding DNA with nucleotide statistics.Bioinform Biol Insights. 2009 Oct 28;3:141-54. doi: 10.4137/bbi.s3030. Bioinform Biol Insights. 2009. PMID: 20140062 Free PMC article.
-
The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins.Bioinform Biol Insights. 2014 May 20;8:93-108. doi: 10.4137/BBI.S13161. eCollection 2014. Bioinform Biol Insights. 2014. PMID: 24899802 Free PMC article.
References
-
- Ikehara K, Omori Y, Arai R, et al. A Novel Theory on the Origin of the Genetic Code: A GNC-SNS Hypothesis. J Mol Evol. 2002;54:530–8. - PubMed
-
- Oba T, Fukushima J, Maruyama M, et al. Catalytic activities of [GADV]-peptides. Origins of Life and Evolution of Biospheres. 2005;34:447–60. - PubMed
-
- Musto H, Rodriguez-Maseda H, Bernardi G. Compositional properties of nuclear genes from. Plasmodium falciparum Gene. 1995;152:127–32. - PubMed
LinkOut - more resources
Full Text Sources