Self-identification of protein-coding regions in microbial genomes
- PMID: 9707594
- PMCID: PMC21455
- DOI: 10.1073/pnas.95.17.10026
Self-identification of protein-coding regions in microbial genomes
Abstract
A new method for predicting protein-coding regions in microbial genomic DNA sequences is presented. It uses an ab initio iterative Markov modeling procedure to automatically perform the partition of genomic sequences into three subsets shown to correspond to coding, coding on the opposite strand, and noncoding segments. In contrast to current methods, such as GENEMARK [Borodovsky, M. & McIninch, J. D. (1993) Comput. Chem. 17, 123-133], no training set or prior knowledge of the statistical properties of the studied genome are required. This new method tolerates error rates of 1-2% and can process unassembled sequences. It is thus ideal for the analysis of genome survey and/or fragmented sequence data from uncharacterized microorganisms. The method was validated on 10 complete bacterial genomes (from four major phylogenetic lineages). The results show that protein-coding regions can be identified with an accuracy of up to 90% with a totally automated and objective procedure.
Figures

Similar articles
-
How to interpret an anonymous bacterial genome: machine learning approach to gene identification.Genome Res. 1998 Nov;8(11):1154-71. doi: 10.1101/gr.8.11.1154. Genome Res. 1998. PMID: 9847079
-
Prokaryotic gene prediction using GeneMark and GeneMark.hmm.Curr Protoc Bioinformatics. 2003 May;Chapter 4:Unit4.5. doi: 10.1002/0471250953.bi0405s01. Curr Protoc Bioinformatics. 2003. PMID: 18428700
-
Gene identification in novel eukaryotic genomes by self-training algorithm.Nucleic Acids Res. 2005 Nov 28;33(20):6494-506. doi: 10.1093/nar/gki937. Print 2005. Nucleic Acids Res. 2005. PMID: 16314312 Free PMC article.
-
[Gene identification in prokaryotic genomes using hidden Markov model].Tanpakushitsu Kakusan Koso. 1997 Dec;42(17 Suppl):2993-3000. Tanpakushitsu Kakusan Koso. 1997. PMID: 9455224 Review. Japanese. No abstract available.
-
Comparing genomes in terms of protein structure: surveys of a finite parts list.FEMS Microbiol Rev. 1998 Oct;22(4):277-304. doi: 10.1111/j.1574-6976.1998.tb00371.x. FEMS Microbiol Rev. 1998. PMID: 10357579 Review.
Cited by
-
DNA-energetics-based analyses suggest additional genes in prokaryotes.J Biosci. 2012 Jul;37(3):433-44. doi: 10.1007/s12038-012-9221-7. J Biosci. 2012. PMID: 22750981
-
Dictionary-driven prokaryotic gene finding.Nucleic Acids Res. 2002 Jun 15;30(12):2710-25. doi: 10.1093/nar/gkf338. Nucleic Acids Res. 2002. PMID: 12060689 Free PMC article.
-
Tropheryma whipplei Twist: a human pathogenic Actinobacteria with a reduced genome.Genome Res. 2003 Aug;13(8):1800-9. doi: 10.1101/gr.1474603. Genome Res. 2003. PMID: 12902375 Free PMC article.
-
Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER.J Biosci. 2002 Feb;27(1 Suppl 1):7-14. doi: 10.1007/BF02703679. J Biosci. 2002. PMID: 11927773
-
The genome of Borrelia recurrentis, the agent of deadly louse-borne relapsing fever, is a degraded subset of tick-borne Borrelia duttonii.PLoS Genet. 2008 Sep 12;4(9):e1000185. doi: 10.1371/journal.pgen.1000185. PLoS Genet. 2008. PMID: 18787695 Free PMC article.
References
-
- Fleischmann R D, Adams M D, White O, Clayton R A, Kirkness E F, Kerlavage A R, Bult C J, Tomb J F, Dougherty B A, Merrick J M, et al. Science. 1995;269:496–512. - PubMed
-
- Fraser C M, Gocayne J D, White O, Adams M D, Clayton R A, Fleischmann R D, Bult C J, Kerlavage A R, Sutton G, Kelley J M, et al. Science. 1995;270:397–403. - PubMed
-
- Bult C J, White O, Olsen G J, Zhou L, Fleischmann R D, Sutton G G, Blake J A, FitzGerald L M, Clayton R A, Gocayne J D, et al. Science. 1996;273:1058–1073. - PubMed
-
- Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, et al. DNA Res. 1996;3:109–136. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources