Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
- PMID: 17286872
- PMCID: PMC1805508
- DOI: 10.1186/1471-2105-8-47
Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
Abstract
Background: Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) are important not only for the annotation of unknown proteins but also for the prediction of operons, promoters, and small non-coding RNA genes, as this typically makes use of the intergenic distance. A further problem is that most existing methods are optimized for Escherichia coli data sets; applying these methods to newly sequenced bacterial genomes may not result in an equivalent level of accuracy.
Results: Based on a biological representation of the translation process, we applied Bayesian statistics to create a score function for predicting translation initiation sites. In contrast to existing programs, our combination of methods uses supervised learning to optimally use the set of known translation initiation sites. We combined the Ribosome Binding Site (RBS) sequence, the distance between the translation initiation site and the RBS sequence, the base composition of the start codon, the nucleotide composition (A-rich sequences) following start codons, and the expected distribution of the protein length in a Bayesian scoring function. To further increase the prediction accuracy, we also took into account the operon orientation. The outcome of the procedure achieved a prediction accuracy of 93.2% in 858 E. coli genes from the EcoGene data set and 92.7% accuracy in a data set of 1243 Bacillus subtilis 'non-y' genes. We confirmed the performance in the GC-rich Gamma-Proteobacteria Herminiimonas arsenicoxydans, Pseudomonas aeruginosa, and Burkholderia pseudomallei K96243.
Conclusion: Hon-yaku, being based on a careful choice of elements important in translation, improved the prediction accuracy in B. subtilis data sets and other bacteria except for E. coli. We believe that most remaining mispredictions are due to atypical ribosomal binding sequences used in specific translation control processes, or likely errors in the training data sets.
Figures



Similar articles
-
Accuracy improvement for identifying translation initiation sites in microbial genomes.Bioinformatics. 2004 Dec 12;20(18):3308-17. doi: 10.1093/bioinformatics/bth390. Epub 2004 Jul 9. Bioinformatics. 2004. PMID: 15247104
-
Identifying translation initiation sites in prokaryotes using support vector machine.J Theor Biol. 2010 Feb 21;262(4):644-9. doi: 10.1016/j.jtbi.2009.10.023. Epub 2009 Oct 17. J Theor Biol. 2010. PMID: 19840808
-
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607. Nucleic Acids Res. 2001. PMID: 11410670 Free PMC article.
-
Optimizing scaleup yield for protein production: Computationally Optimized DNA Assembly (CODA) and Translation Engineering.Biotechnol Annu Rev. 2007;13:27-42. doi: 10.1016/S1387-2656(07)13002-7. Biotechnol Annu Rev. 2007. PMID: 17875472 Review.
-
The relative value of operon predictions.Brief Bioinform. 2008 Sep;9(5):367-75. doi: 10.1093/bib/bbn019. Epub 2008 Apr 17. Brief Bioinform. 2008. PMID: 18420711 Review.
Cited by
-
ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes.Nucleic Acids Res. 2008 Jan;36(Database issue):D114-9. doi: 10.1093/nar/gkm799. Epub 2007 Oct 16. Nucleic Acids Res. 2008. PMID: 17942412 Free PMC article.
-
The Genome Reverse Compiler: an explorative annotation tool.BMC Bioinformatics. 2009 Jan 27;10:35. doi: 10.1186/1471-2105-10-35. BMC Bioinformatics. 2009. PMID: 19173744 Free PMC article.
-
Identification of Translation Start Sites in Bacterial Genomes.Methods Mol Biol. 2021;2252:27-55. doi: 10.1007/978-1-0716-1150-0_2. Methods Mol Biol. 2021. PMID: 33765270 Free PMC article.
-
Genome reannotation of Escherichia coli CFT073 with new insights into virulence.BMC Genomics. 2009 Nov 22;10:552. doi: 10.1186/1471-2164-10-552. BMC Genomics. 2009. PMID: 19930606 Free PMC article.
-
Phylogenetic and evolutionary relationships of RubisCO and the RubisCO-like proteins and the functional lessons provided by diverse molecular forms.Philos Trans R Soc Lond B Biol Sci. 2008 Aug 27;363(1504):2629-40. doi: 10.1098/rstb.2008.0023. Philos Trans R Soc Lond B Biol Sci. 2008. PMID: 18487131 Free PMC article. Review.
References
-
- Moreno-Hagelsieb G, Collado-Vides J. A powerful non-homology method for the prediction of operons in prokaryotes. Bioinformatics. 2002. pp. S329–36. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous