Accuracy improvement for identifying translation initiation sites in microbial genomes
- PMID: 15247104
- DOI: 10.1093/bioinformatics/bth390
Accuracy improvement for identifying translation initiation sites in microbial genomes
Abstract
Motivation: At present the computational gene identification methods in microbial genomes have a high prediction accuracy of verified translation termination site (3' end), but a much lower accuracy of the translation initiation site (TIS, 5' end). The latter is important to the analysis and the understanding of the putative protein of a gene and the regulatory machinery of the translation. Improving the accuracy of prediction of TIS is one of the remaining open problems.
Results: In this paper, we develop a four-component statistical model to describe the TIS of prokaryotic genes. The model incorporates several features with biological meanings, including the correlation between translation termination site and TIS of genes, the sequence content around the start codon; the sequence content of the consensus signal related to ribosomal binding sites (RBSs), and the correlation between TIS and the upstream consensus signal. An entirely non-supervised training system is constructed, which takes as input a set of annotated coding open reading frames (ORFs) by any gene finder, and gives as output a set of organism-specific parameters (without any prior knowledge or empirical constants and formulas). The novel algorithm is tested on a set of reliable datasets of genes from Escherichia coli and Bacillus subtillis. MED-Start may correctly predict 95.4% of the start sites of 195 experimentally confirmed E.coli genes, 96.6% of 58 reliable B.subtillis genes. Moreover, the test results indicate that the algorithm gives higher accuracy for more reliable datasets, and is robust to the variation of gene length. MED-Start may be used as a postprocessor for a gene finder. After processing by our program, the improvement of gene start prediction of gene finder system is remarkable, e.g. the accuracy of TIS predicted by MED 1.0 increases from 61.7 to 91.5% for 854 E.coli verified genes, while that by GLIMMER 2.02 increases from 63.2 to 92.0% for the same dataset. These results show that our algorithm is one of the most accurate methods to identify TIS of prokaryotic genomes.
Availability: The program MED-Start can be accessed through the website of CTB at Peking University: http://ctb.pku.edu.cn/main/SheGroup/MED_Start.htm.
Similar articles
-
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607. Nucleic Acids Res. 2001. PMID: 11410670 Free PMC article.
-
Computational evaluation of TIS annotation for prokaryotic genomes.BMC Bioinformatics. 2008 Mar 25;9:160. doi: 10.1186/1471-2105-9-160. BMC Bioinformatics. 2008. PMID: 18366730 Free PMC article.
-
GS-Finder: a program to find bacterial gene start sites with a self-training method.Int J Biochem Cell Biol. 2004 Mar;36(3):535-44. doi: 10.1016/j.biocel.2003.08.013. Int J Biochem Cell Biol. 2004. PMID: 14687930
-
AUGcontext DB: a comprehensive catalog of the mRNA AUG initiator codon context across eukaryotes.RNA Biol. 2025 Dec;22(1):1-5. doi: 10.1080/15476286.2025.2465196. Epub 2025 Feb 13. RNA Biol. 2025. PMID: 39936323 Free PMC article. Review.
-
Pushing the limits of the scanning mechanism for initiation of translation.Gene. 2002 Oct 16;299(1-2):1-34. doi: 10.1016/s0378-1119(02)01056-9. Gene. 2002. PMID: 12459250 Free PMC article. Review.
Cited by
-
Complete genome sequence of a beneficial plant root-associated bacterium, Pseudomonas brassicacearum.J Bacteriol. 2011 Jun;193(12):3146. doi: 10.1128/JB.00411-11. Epub 2011 Apr 22. J Bacteriol. 2011. PMID: 21515771 Free PMC article.
-
Inability of Prevotella bryantii to form a functional Shine-Dalgarno interaction reflects unique evolution of ribosome binding sites in Bacteroidetes.PLoS One. 2011;6(8):e22914. doi: 10.1371/journal.pone.0022914. Epub 2011 Aug 12. PLoS One. 2011. PMID: 21857964 Free PMC article.
-
ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes.Nucleic Acids Res. 2008 Jan;36(Database issue):D114-9. doi: 10.1093/nar/gkm799. Epub 2007 Oct 16. Nucleic Acids Res. 2008. PMID: 17942412 Free PMC article.
-
TICO: a tool for postprocessing the predictions of prokaryotic translation initiation sites.Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W588-90. doi: 10.1093/nar/gkl313. Nucleic Acids Res. 2006. PMID: 16845076 Free PMC article.
-
Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes.BMC Genomics. 2011 Jul 12;12:361. doi: 10.1186/1471-2164-12-361. BMC Genomics. 2011. PMID: 21749696 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources